Download Latest Version arm64-v8a-android-1.13.4.tgz (675.6 MB)
Email in envelope

Get an email when there's a new version of MegEngine

Home / v1.13.2
Name Modified Size InfoDownloads / Week
Parent folder
arm64-v8a-android-1.13.2.tgz 2023-10-23 674.8 MB
arm64-v8a-linux-1.13.2.tgz 2023-10-23 685.7 MB
armeabi-v7a-android-1.13.2.tgz 2023-10-23 589.2 MB
armeabi-v7a-hardfp-linux-1.13.2.tgz 2023-10-23 588.7 MB
armeabi-v7a-softfp-linux-1.13.2.tgz 2023-10-23 588.5 MB
MegEngine v1.13.2 source code.tar.gz 2023-10-13 7.9 MB
MegEngine v1.13.2 source code.zip 2023-10-13 11.2 MB
README.md 2023-10-13 5.2 kB
Totals: 8 Items   3.1 GB 0

MegEngine

know issue

  • cuda118 在使用 TensorRT 进行推理时可能出现资源析构异常的问题;
  • 训练 benchmark avg_cpu_usage 指标对比上两次版本有平均 32.4% 的涨幅;

Bugfix

Python API

  • 修复 arange function 不能设置 device 为 cpu 的问题。

第三方硬件

  • 修复多模型多线程的环境中,atlas 报 event 资源不够的问题。
  • 修复 atlas 同步时需要激活 atlas_env 的问题;修复由于 tensordesc 没释放导致的内存泄漏问题;修复 aclInit 重复的问题。

通用组件

  • 修复 custom op 实现 builtin op 时静态变量初始化顺序错误的问题。
  • 修复 megengine 包含 setenv 依赖导致在 android 环境下存在的内存踩踏风险问题。

XLA

  • 修复 xla 使用时显存增大、找不到 ptxas 以及 rng seed 设置不正确的问题。

ARM

  • 升级 ndk 版本到 r25c,以解决旧版 ndk 下 armv7 开启 -D_FORTIFY_SOURCE=2 不生效的问题;修复 conv_backdata 算子访存越界问题;优化编译速度,android 设备编译可提速 30%。

New Features

Python API

  • 增加 python 侧的高维 sort 支持。
  • 添加 flip、rotate、resize、rot90 算子。

周边工具

  • 支持 dump 模型在 MegBrain v8.14 的前向兼容。

通用组件

  • 添加 where 的 kernel 实现。

XLA

  • XLA 支持 partial_trace 的函数在输入 shape 变化的情况下 fallback 到原始的 python 函数;partial_trace 支持将 all_reduce 等集合通信算子编译到 xla executable,以提升 xla trace 的模型性能;partial_trace 支持 trace, optimizer._update,支持加速 optimizer step 方法。

CUDA

  • 添加三种 mixup 的三种 gpu 实现(cutmix, fmix, mixup)。
  • 新增对 cropandpad 算子的支持。
  • 增加 elemwise uint16 dtype 计算的支持。

Dataloader

  • 新增 dataloader 对数据各阶段处理的监控,通过环境变量 os.environ[‘MGE_DATA_MONITOR’] =‘1’ 打开此功能。 num_workers = 0 时, 获取拉取数据时间 dataset_time、数据转换时间 transform_time、拼 batch 时间 collate_time; num_workers > 0时,在以上指标基础上,可再获取到进程通信时间 IPC_time。

Improvements

文档

  • 优化现有 api 的 docstring。

MegEngine Lite

Bug Fixes

通用组件

  • 修复调用 get_io_tensor 获取设备类型时概率性出错的问题。

MegEngine

know issue

  • Cuda118 may encounter a resource destruction exception when using TensorRT for inference;
  • The training benchmark avg_cpu_usage indicator has an average increase of 32.4% compared to the previous two versions;

Bugfix

Python API

  • Fix the bug that arange function cannot set device to cpu.

Third-party hardware

  • Fixed memory leak problem caused by tensordesc not being free;Fixed the problem that atlas_env activation is required during synchronization;Fixed aclInit repeated problem.

Common components

  • Fix the problem of wrong initialization order of static variables when custom op implements builtin op.
  • Fixed the problem that Megengine uses setenv may cause the memory stampede risk in android.

XLA

  • Fixed the problem of increased video memory, unable to find ptxas and incorrect rng seed settings when using xla.

ARM

  • Upgrade the ndk version to r25c to solve the problem that -D_FORTIFY_SOURCE=2 does not take effect when armv7 is enabled under the old version of ndk; fix the conv_backdata operator memory access out-of-bounds problem; optimize the compilation speed, android device compilation can be accelerated by 30%.

New Features

Python API

  • Add support for high-dimensional sort on the python side.
  • Add flip, rotate, resize and rot90 operators.

Peripheral tools

  • Support forward compatibility of dumped models in MegBrain v8.14.

Common components

  • Add the kernel implementation of where operator。

XLA

  • XLA supports the function of partial_trace to fallback to the original python function when the input shape changes; partial_trace supports compiling set communication operators such as all_reduce into xla executable to improve the model performance of xla trace; partial_trace supports trace, optimizer._update, Supports accelerated optimizer step method.

CUDA

  • Add three mixup gpu implementations (cutmix, fmix, mixup).
  • Add cropandpad operation.
  • Add support for elemwise uint16 dtype calculations.

Dataloader

  • Add the dataloader monitoring function for each stage of data processing: when num_workers = 0, obtain the data pulling time dataset_time, data conversion time transform_time, and batch batch time collate_time. When num_workers > 0, use os.environ['MGE_DATA_MONITOR'] ='1' to obtain the process communication time IPC_time.

Improvements

Documentation

  • Optimize the docstring of existing interfaces.

MegEngine Lite

Bug Fixes

Common components

  • Fixed the problem of probabilistic errors when calling get_io_tensor to obtain the device type.
Source: README.md, updated 2023-10-13