MegEngine Files

Easy-to-use deep learning framework with 3 key features

This is an exact mirror of the MegEngine project, hosted at https://github.com/MegEngine/MegEngine. SourceForge is not affiliated with MegEngine. For more information, see the SourceForge Open Source Mirror Directory.

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
arm64-v8a-android-1.13.2.tgz	2023-10-23	674.8 MB	0
arm64-v8a-linux-1.13.2.tgz	2023-10-23	685.7 MB	0
armeabi-v7a-android-1.13.2.tgz	2023-10-23	589.2 MB	0
armeabi-v7a-hardfp-linux-1.13.2.tgz	2023-10-23	588.7 MB	0
armeabi-v7a-softfp-linux-1.13.2.tgz	2023-10-23	588.5 MB	0
MegEngine v1.13.2 source code.tar.gz	2023-10-13	7.9 MB	0
MegEngine v1.13.2 source code.zip	2023-10-13	11.2 MB	0
README.md	2023-10-13	5.2 kB	0
Totals: 8 Items		3.1 GB	0

MegEngine

know issue

cuda118 在使用 TensorRT 进行推理时可能出现资源析构异常的问题；
训练 benchmark avg_cpu_usage 指标对比上两次版本有平均 32.4% 的涨幅；

Bugfix

Python API

修复 arange function 不能设置 device 为 cpu 的问题。

第三方硬件

修复多模型多线程的环境中，atlas 报 event 资源不够的问题。
修复 atlas 同步时需要激活 atlas_env 的问题；修复由于 tensordesc 没释放导致的内存泄漏问题；修复 aclInit 重复的问题。

通用组件

修复 custom op 实现 builtin op 时静态变量初始化顺序错误的问题。
修复 megengine 包含 setenv 依赖导致在 android 环境下存在的内存踩踏风险问题。

XLA

修复 xla 使用时显存增大、找不到 ptxas 以及 rng seed 设置不正确的问题。

ARM

升级 ndk 版本到 r25c，以解决旧版 ndk 下 armv7 开启 -D_FORTIFY_SOURCE=2 不生效的问题；修复 conv_backdata 算子访存越界问题；优化编译速度，android 设备编译可提速 30%。

New Features

Python API

增加 python 侧的高维 sort 支持。
添加 flip、rotate、resize、rot90 算子。

周边工具

支持 dump 模型在 MegBrain v8.14 的前向兼容。

通用组件

添加 where 的 kernel 实现。

XLA

XLA 支持 partial_trace 的函数在输入 shape 变化的情况下 fallback 到原始的 python 函数；partial_trace 支持将 all_reduce 等集合通信算子编译到 xla executable，以提升 xla trace 的模型性能；partial_trace 支持 trace, optimizer._update，支持加速 optimizer step 方法。

CUDA

添加三种 mixup 的三种 gpu 实现（cutmix, fmix, mixup）。
新增对 cropandpad 算子的支持。
增加 elemwise uint16 dtype 计算的支持。

Dataloader

新增 dataloader 对数据各阶段处理的监控，通过环境变量 os.environ[‘MGE_DATA_MONITOR’] =‘1’ 打开此功能。 num_workers = 0 时, 获取拉取数据时间 dataset_time、数据转换时间 transform_time、拼 batch 时间 collate_time； num_workers > 0时，在以上指标基础上，可再获取到进程通信时间 IPC_time。

Improvements

文档

优化现有 api 的 docstring。

MegEngine Lite

Bug Fixes

通用组件

修复调用 get_io_tensor 获取设备类型时概率性出错的问题。

MegEngine

know issue

Cuda118 may encounter a resource destruction exception when using TensorRT for inference;
The training benchmark avg_cpu_usage indicator has an average increase of 32.4% compared to the previous two versions;

Bugfix

Python API

Fix the bug that arange function cannot set device to cpu.

Third-party hardware

Fixed memory leak problem caused by tensordesc not being free;Fixed the problem that atlas_env activation is required during synchronization;Fixed aclInit repeated problem.

Common components

Fix the problem of wrong initialization order of static variables when custom op implements builtin op.
Fixed the problem that Megengine uses setenv may cause the memory stampede risk in android.

XLA

Fixed the problem of increased video memory, unable to find ptxas and incorrect rng seed settings when using xla.

ARM

Upgrade the ndk version to r25c to solve the problem that -D_FORTIFY_SOURCE=2 does not take effect when armv7 is enabled under the old version of ndk; fix the conv_backdata operator memory access out-of-bounds problem; optimize the compilation speed, android device compilation can be accelerated by 30%.

New Features

Python API

Add support for high-dimensional sort on the python side.
Add flip, rotate, resize and rot90 operators.

Peripheral tools

Support forward compatibility of dumped models in MegBrain v8.14.

Common components

Add the kernel implementation of where operator。

XLA

XLA supports the function of partial_trace to fallback to the original python function when the input shape changes; partial_trace supports compiling set communication operators such as all_reduce into xla executable to improve the model performance of xla trace; partial_trace supports trace, optimizer._update, Supports accelerated optimizer step method.

CUDA

Add three mixup gpu implementations (cutmix, fmix, mixup).
Add cropandpad operation.
Add support for elemwise uint16 dtype calculations.

Dataloader

Add the dataloader monitoring function for each stage of data processing: when num_workers = 0, obtain the data pulling time dataset_time, data conversion time transform_time, and batch batch time collate_time. When num_workers > 0, use os.environ['MGE_DATA_MONITOR'] ='1' to obtain the process communication time IPC_time.

Improvements

Documentation

Optimize the docstring of existing interfaces.

MegEngine Lite

Bug Fixes

Common components

Fixed the problem of probabilistic errors when calling get_io_tensor to obtain the device type.

Source: README.md, updated 2023-10-13

Other Useful Business Software

Our Free Plans just got better! | Auth0 Icon

Our Free Plans just got better! | Auth0

With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now

Cloud data warehouse to power your data-driven innovation Icon

Cloud data warehouse to power your data-driven innovation

BigQuery is a serverless and cost-effective enterprise data warehouse that works across clouds and scales with your data.

BigQuery Studio provides a single, unified interface for all data practitioners of various coding skills to simplify analytics workflows from data ingestion and preparation to data exploration and visualization to ML model creation and use. It also allows you to use simple SQL to access Vertex AI foundational models directly inside BigQuery for text processing tasks, such as sentiment analysis, entity extraction, and many more without having to deal with specialized models.

Try for free

Recommended Projects

ONNX Runtime
ONNX Runtime: cross-platform, high performance ML inferencing
Stable Diffusion Version 2
High-Resolution Image Synthesis with Latent Diffusion Models
TensorRT
C++ library for high performance inference on NVIDIA GPUs
DeepSpeed
Deep learning optimization library: makes distributed training easy
TorchServe
Serve, optimize and scale PyTorch models in production