Download Latest Version apache-tvm-src-v0.22.0.tar.gz (81.5 MB)
Email in envelope

Get an email when there's a new version of tvm

Home / v0.20.0
Name Modified Size InfoDownloads / Week
Parent folder
apache-tvm-src-v0.20.0.tar.gz.asc 2025-04-27 833 Bytes
apache-tvm-src-v0.20.0.tar.gz.sha512 2025-04-27 164 Bytes
apache-tvm-src-v0.20.0.tar.gz 2025-04-27 114.6 MB
Apache TVM v0.20.0 source code.tar.gz 2025-04-12 5.9 MB
Apache TVM v0.20.0 source code.zip 2025-04-12 9.0 MB
README.md 2025-04-12 17.9 kB
Totals: 6 Items   129.4 MB 0

Introduction

The TVM community has worked since the last release to deliver the following new exciting improvements!

The main tags are below (bold text is with lots of progress): Relax (especial PyTorch frontend), CUDA etc.

Please visit the full listing of commits for a complete view: v0.20.dev0...v0.20.0.rc0.

Community

None.

RFCs

None.

Adreno

  • #17608 - [WINDOWS] Windows build dependencies for Adreno target

BugFix

  • #17761 - [FIX][RELAX] fix fusion of transpose + matmul when constant weight
  • #17762 - [Fix] Fix OpenCL header in attention utils
  • #17711 - [Fix][dlight] add an explicit reduction loop check in Reduce
  • #17697 - [Fix] Include <chrono> for std::chrono
  • #17677 - Declare build backend for python package
  • #17598 - [TIR][FIX] update FlopEstimator to include missing nodes
  • #17601 - [Flashinfer][Fix] fix missing args in flashinfer test
  • #17607 - [FIX][TVMC] Fix the mixed precision conversion pipeline

CI

  • #17687 - Update images to 20250226-223225-63bc315f
  • #17680 - update images to 20250225-035137-aeadc31c
  • #17675 - [skip ci]Update github tvmbot
  • #17635 - Cleanup legacy files
  • #17634 - [skip ci]Improve build time
  • #17629 - [skip ci]Robustify CI for SPOT failure
  • #17620 - Unpin pytest-profiling
  • #17621 - [skip ci] Remove legacy CI runners protection
  • #17619 - [Refactor]Remove legacy frontend tests

Dlight

  • #17754 - Fix general reduction rule to support non-last reduction axis
  • #17663 - [CPU] Add CPU Backend Support for GEMV Optimization

Docker

  • #17691 - Fix ml_dtypes downgrade issue introduced by TensorFlow
  • #17686 - Update ml_dtypes to 0.5.1+
  • #17676 - Use Torch GPU on gpu device
  • #17648 - Tensorflow (aka TFLite) upgrade to 2.18.0
  • #17643 - Update ml_dtypes version
  • #17638 - [skip ci]Update ml_dtypes version
  • #17638 - [skip ci]Update ml_dtypes version
  • #17617 - Tensorflow upgrade to 2.18.0

Docs

  • #17650 - Update README
  • #17611 - Download 3rd party embeds to local files
  • #17604 - Update README

MetaSchedule

  • #17104 - Adding post optimization in MetaSchedule to Improve Scheduling

OpenCL & CLML

  • #17571 - [OPENCL][TEXTURE] Improved texture memory planning

Relax

  • #17814 - [PyTorch] Add stack.default and sum.default to exported programs translator
  • #17820 - [PyTorch] Add support for broadcast_to, narrow ops
  • #17822 - [PyTorch] Cleanup tests for ExportedProgram frontend
  • #17806 - [PyTorch] Add Softplus Op Support for Exported Program and FX graph
  • #17817 - [PyTorch] Support dynamic shapes in ExportedProgram frontend
  • #17813 - [PyTorch] Improve ExportedProgram frontend by supporting unflatten.int, hardtanh_.default, dropout_.default, silu_.default, add_.Tensor and relu_.default
  • #17812 - [PyTorch] Support argsort, topk ops for ExportedProgram importer
  • #17810 - [PyTorch] Add support for argsort, sort, topk ops
  • #17809 - [PyTorch] Delete duplicate converter function _to
  • #17807 - [PyTorch] Fix torch 2.6 compatibility issues
  • #17797 - [Pytorch] Update SELU Implementation Using Decomposed Core-Level Ops
  • #17802 - [Pytorch] support for arange in exported programs translator
  • #17801 - [PyTorch] Support where, cumprod and reciprocal ops for ExportedProgram importer
  • #17790 - [PyTorch] Add support for index_select
  • #17786 - [PyTorch] Support softshrink op for ExportedProgram
  • #17788 - [PyTorch] Add support for where, cumprod and reciprocal ops
  • #17785 - [PyTorch] Support prod, std and var ops for ExportedProgram importer
  • #17778 - [PyTorch] Support log2, log10 and log1p ops for ExportedProgram importer
  • #17772 - [PyTorch] Add support for prod, std and var ops
  • #17766 - [PyTorch] Add support for log2, log10 and log1p ops
  • #17760 - [PyTorch] Add support for lerp, select and clone ops
  • #17751 - [PyTorch] Support one_hot, empty_like ops for ExportedProgram importer
  • #17747 - [PyTorch] Support flip, gather, take ops for ExportedProgram importer
  • #17738 - [PyTorch] Support elu, celu, selu ops for ExportedProgram importer
  • #17726 - [PyTorch] Add support for numel, empty_like and one_hot ops
  • #17707 - [PyTorch] Add support for gather, flip and take ops
  • #17702 - [PyTorch] Add support for celu, selu, is_floating_point ops
  • #17694 - [PyTorch] Add support for elu, hardtanh ops
  • #17689 - [PyTorch] Support several binary ops for ExportedProgram importer
  • #17672 - [PyTorch] Refactor binary ops tests
  • #17679 - [PyTorch] Support several unary ops for ExportedProgram importer
  • #17668 - [PyTorch] Add support for and_, lshift, min, or_, rshift, xor ops
  • #17664 - [PyTorch] Add support for ge, gt, le, mod, ne ops
  • #17659 - [PyTorch] Add support for bitwise_not, isfinite, isinf, isnan, logical_not, sign and square ops
  • #17622 - [PyTorch] Add support for abs, ceil, erf, floor, log ops and refactor unary tests
  • #17566 - [ONNX] Add prim experssion support to Neg converter and update Arange converter to use relax.op.arange
  • #17642 - [ONNX]replace topi.split with relax.op.split in the onnx frontend
  • #17674 - [KVCache] PagedKVCache refactor, FlashInfer JIT and MLA integration
  • #17618 - [KVCache] TIR attention kernel support for MLA
  • #17615 - [KVCache] Add KV Cache for CPU Runtime
  • #17616 - [Runtime][KVCache] Initial interface setup for MLA
  • #17782 - [Frontend] Support max/min in frontend op interface
  • #17758 - Allow ingesting tensor.chunk() from exported torch program
  • #17781 - Enable bfloat16 for softmax struct-info inference
  • #17752 - Batch norm correctness on eval mode
  • #17774 - check for tensor_meta in exported_program_translator
  • #17757 - Tensor.split with uneven tensors
  • #17749 - Move TIR backend to gpu_generic
  • #17725 - Ingest Tensor.clamp from torch export
  • #17724 - Add support to ingest Tensor.expand_as()
  • #17723 - Add torch exported program ingestion capability for Tensor.detach(), Tensor.copy_, and aten.lift_fresh_copy
  • #17721 - Allow ingesting Upsample module from torch.export either using Size or Scale Factor argument
  • #17722 - Allow ingesting vector_norm from torch.export
  • #17728 - ingest Tensor.contiguous from torch export
  • #17700 - Fix tree attention for Qwen2-1.5 models
  • #17682 - Add support for func attr inheritance in SplitLayoutRewritePreproc
  • #17654 - [BYOC] OpenCLML offload support for Relax
  • #17633 - Pipeline file reorganization
  • #17626 - Initial setup of relax backend pipeline
  • #17568 - [PASS] Convert layout pass and ops enhanced to support sub indexing

Runtime

  • #17614 - [CLML] Profiling options enabled for CLML
  • #17614 - [CLML] Profiling options enabled for CLML
  • #17570 - [OPENCL] Bugfix

TIR

  • #17799 - Fix reduce buffer allocation position
  • #17783 - [REFACTOR]remove legacy tir::any
  • #17706 - Minor fix for default GPU schedule
  • #17579 - [SoftwarePipeline] Ensure pipeline epilogue and prologue do not overlap
  • #17584 - [LoopPartition] enforcement on loop partition control

TVMC

cuda & cutlass & tensorrt

  • #17789 - [CUTLASS] Add blockwise scale gemm/bmm kernels
  • #17741 - [Codegen][CUDA] Fix codegen of cast among vector bfloat16, fp8 and fp4
  • #17708 - [CUDA] FP4 cast and reinterpret support
  • #17639 - [CUDA] Remove htanh from unsupported math ops for CUDA 12.8
  • #16950 - [Codegen, CUDA] Add FP8 Tensor Core Codegen

web

  • #17695 - [WASM] Update wasm include in accordance to kv cache revamp

Misc

  • #17796 - [Cublas] Added support for bfloat16 while dispatching to cublas kernels
  • #17763 - [Flashinfer] Added jit flow for sampling kernel
  • #17811 - [NFC] Fix explict typo
  • #17780 - [3rdparty] Enable bfloat16 for custom allreduce kernel
  • #17784 - [REFACTOR] Phase out StackVM
  • #17750 - BugFix: Relax comment
  • #17748 - [Codegen] Support codegen for vectorized tir.ShuffleNode
  • #17743 - Fix: Change variable i to x in split operation in cross_compilation_and_rpc.py
  • #17730 - [Attention] Added caching for flashinfer binaries during JIT
  • #17733 - [Refactor] Clean up Relay references in the codebase
  • #17739 - [BF16] Support ndarray.asnumpy() to bfloat16 tensor natively using ml_dtypes
  • #17734 - Remove Google Analytics
  • #17731 - [IR] Compact Functor vtable
  • #17736 - Fix typos in comments and strings
  • #17670 - [DataType] BF16 Support
  • #17727 - [FFI] Fix dynamic FFI index to ensure compatibility
  • #17718 - [Refactor] Migrate build API to tvm.compile
  • #17714 - [FFI] Phase out ctypes fallback in favor of cython
  • #17716 - Fix the get_target_compute_version for sm >= 100
  • #17710 - [Refactor] Introduce base Executable class and tvm.compile interface
  • #17713 - [REFACTOR] Cleanup legacy relay runtime data structures
  • #17712 - [DataType] Rename FP8 dtypes to standard names
  • #17703 - Fix typos in multiple files
  • #17693 - updated the assert in BindParams to allow tvm.relax.Constant
  • #17701 - [Refactor] Remove legacy TE schedule tag
  • #17683 - [MSC] Remove relay
  • #17688 - Fix relax.ccl.scatter_from_worker0 assert
  • #17630 - [Codegen] FP4 support
  • #17685 - [REFACTOR] Cleanup legacy TE-based passes
  • #17681 - [REFACTOR] Followup cleanup of relay phase out
  • #17678 - Bump 3rdparty/cutlass_fpA_intB_gemm
  • #17669 - [REFACTOR] Allow target dependent default tir pipeline dispatch in tir.build()
  • #17665 - [REFACTOR] move build flow from C++ to Python
  • #17624 - Added support for normal MLA kernel
  • #17641 - Pick up vector length from 'zvlXXXb' (RVV) mattr for riscv
  • #17666 - [Refactor] Improve TargetHasSVE function with optional target handling
  • #17661 - [Refactor] Phrase out python dependency decorator
  • #17662 - [REFACTOR] Phase out te.Schedule c++ components
  • #17660 - [REFACTOR] Phase out relay c++ components
  • #17655 - Upgrading onnx and onnxrt verions
  • #17657 - Update argument order for relax.op.pad to make it round-trippable
  • #17658 - [REFACTOR] Phase out te.schedule python components
  • #17653 - Update images to 20250214-034537-bd1411f8
  • #17656 - [REFACTOR] Phase out relay python components
  • #17649 - [Refactor] Phase out python dependency attrs
  • #17644 - Bump rollup from 2.79.1 to 2.79.2 in /web
  • #17637 - [PYTHON] Build cython by default
  • #17631 - Handle vector width (VLEN) for RISCV arches
  • #17613 - Bug Fix: Removed unused code
  • #17585 - [Relay]Disable InferType if it was done and no changes after previous pass
  • #17605 - [Refactor] Phase out legacy example apps
  • #17603 - [Refactor] Phase out legacy docs
  • #17513 - [GRAPH RT] Additional API support
Source: README.md, updated 2025-04-12