IREE Release v3.8.0
1. Compiler
1.1 Data Tiling & Scaled Matmul
- Introduced DataTiledScaledMMAAttr and implemented scaled matmul data tiling materialization using new scaled intrinsic attributes for improved codegen flexibility. (https://github.com/iree-org/iree/pull/22176, https://github.com/iree-org/iree/pull/22189)
- Added ping-pong ukernel support for FP8 and FP16 data tiling, tuned for LLaMA workloads, delivering up to 30–40% latency reduction vs. non–data-tiled paths. (https://github.com/iree-org/iree/pull/21919)
- Added ROCm encoding specialization via UKernelProviderInterface for data-tiled ukernels. (https://github.com/iree-org/iree/pull/21914)
- Introduced intentional padded configurations for (I)GEMM to improve convolution performance by ~8% with no degradation in backward paths. (https://github.com/iree-org/iree/pull/21931)
- Disabled data-tiling by default for CPU backends due to memory and backend inconsistencies; it’s now opt-in via --iree-opt-data-tiling, with updated CPU docs and tests reflecting the change. (https://github.com/iree-org/iree/pull/21935)
- Published a detailed blog on Data Tiling introducing how operand layouts are transformed to match hardware-preferred formats for better locality and cache efficiency. (https://iree.dev/community/blog/2025-08-25-data-tiling-walkthrough/)
1.2 Convolution
- Transposed input backward convolution filter layout from CHWF → FHWC, aligning with matmul_transpose_b and improving performance. (https://github.com/iree-org/iree/pull/22100)
- Reordered iterator dimensions for input backward convolutions to match forward NHWC-FHWC conv layout, simplifying autotuning and shape handling. (https://github.com/iree-org/iree/pull/22208)
- Enabled extract slice propagation during convolution padding to improve fusion opportunities. (https://github.com/iree-org/iree/pull/21948)
1.3 Matmul & Vector Distribute
- Removed virtual MMAs from vector distribute matmul/conv pipelines to fix regressions and restore original performance on Punet configurations. (https://github.com/iree-org/iree/pull/22202)
- Added support for distributing subgroups across multiple M dimensions in vector distribute pipelines, improving parallel utilization. (https://github.com/iree-org/iree/pull/22000)
1.4 Others
- Added encoding propagation and fusion passes in the default dispatch creation path, improving layout-based fusion. (https://github.com/iree-org/iree/pull/22063)
- Introduced optional split-reduction size inference for batch normalization. (https://github.com/iree-org/iree/pull/21731)
- Fused broadcasts with attention consumers instead of producers, improving dimension inference and downstream fusion. (https://github.com/iree-org/iree/pull/22008)
- Updated ConvertAccGEMMToGEMM to support scaled GEMMs. (https://github.com/iree-org/iree/pull/22093)
- Reordered memref reshapes above empty tensor elimination to ensure correct dominance in bufferization. (https://github.com/iree-org/iree/pull/22045)
- Fixes and Refinements (https://github.com/iree-org/iree/pull/22222, https://github.com/iree-org/iree/pull/22106, https://github.com/iree-org/iree/pull/22179, https://github.com/iree-org/iree/pull/22041, https://github.com/iree-org/iree/pull/22143, https://github.com/iree-org/iree/pull/22095, https://github.com/iree-org/iree/pull/22233, https://github.com/iree-org/iree/pull/22197, https://github.com/iree-org/iree/pull/22195, https://github.com/iree-org/iree/pull/22033, https://github.com/iree-org/iree/pull/22031, https://github.com/iree-org/iree/pull/21997, https://github.com/iree-org/iree/pull/21910, https://github.com/iree-org/iree/pull/21970, https://github.com/iree-org/iree/pull/21952, https://github.com/iree-org/iree/pull/21900, https://github.com/iree-org/iree/pull/21890, https://github.com/iree-org/iree/pull/21665, https://github.com/iree-org/iree/pull/22100, https://github.com/iree-org/iree/pull/22208, https://github.com/iree-org/iree/pull/22045, https://github.com/iree-org/iree/pull/22202)
2. Runtime
- Split hoisted async constant lifetimes to drastically reduce retained memory (e.g., 9 GB → 500 KB in large tiled workloads). (#21995)
- Added per–entry-point flags and workgroup size emission, preparing for new HAL APIs and better runtime introspection.
- ⚠️ Breaking change: local executable library format bumped to v0.6. (https://github.com/iree-org/iree/pull/21754, #22078, #21950)
- Updated GPU executable headers for versioning and added a new infer-format call to safely infer executable data format and size.
- ⚠️ Breaking change: requires GPU executable recompilation.(#21763)
- CPU matmul configuration switched to linalg::LinalgOp interface for better op fusion and flexibility. (#21954)
- General Enhancements and Fixes (https://github.com/iree-org/iree/pull/22101, https://github.com/iree-org/iree/pull/22110, https://github.com/iree-org/iree/pull/22102, https://github.com/iree-org/iree/pull/22048, https://github.com/iree-org/iree/pull/21921, https://github.com/iree-org/iree/pull/22075)
Change Log
Git History
## What's Changed * [DT] Fuse encoding ops more aggressively for multi-use, gather, and slices ops. by @hanhanW in https://github.com/iree-org/iree/pull/21830 * [Codegen][Tuner]: improve python binding to query target info by @bangtianliu in https://github.com/iree-org/iree/pull/21812 * [Codegen][Tuner] retire the C/Python binding for querying mma intrinsic. NFC. by @bangtianliu in https://github.com/iree-org/iree/pull/21816 * [Integrate] Drop llvm/llvm-project@b4c31dc revert. by @hanhanW in https://github.com/iree-org/iree/pull/21851 * [Encoding] Support SetEncoding on scaled contraction ops by @Max191 in https://github.com/iree-org/iree/pull/21825 * [Test] Add onnx_ops test suites with O2/O3 optimization level. by @hanhanW in https://github.com/iree-org/iree/pull/21838 * [CodeGen] Do not fuse parallel ops if they directly write to destination. by @hanhanW in https://github.com/iree-org/iree/pull/21837 * [GPU] Add pattern to fold fill into pad ops by @nirvedhmeshram in https://github.com/iree-org/iree/pull/21864 * [Codegen][IGEMM] Do not pre-pad convs with CHW layout or small input channel size by @yzhang93 in https://github.com/iree-org/iree/pull/21839 * [GPU] Remove reshape by expansion in workgroup scope of combine layout pass by @nirvedhmeshram in https://github.com/iree-org/iree/pull/21869 * [CPU] Remove passing tests from expected_compile_failures list. by @hanhanW in https://github.com/iree-org/iree/pull/21871 * [GPU] Use Affine map for size calculations of alloca's in fission pass by @nirvedhmeshram in https://github.com/iree-org/iree/pull/21870 * [Codegen][AMDGPU] Fix matmul miscompile on RDNA4 by @kuhar in https://github.com/iree-org/iree/pull/21873 * [NFC] Code Quality changes by @Muzammiluddin-Syed-ECE in https://github.com/iree-org/iree/pull/21876 * Avoid needles isa checks. NFC. by @kuhar in https://github.com/iree-org/iree/pull/21885 * [VectorDistribute] Refactor layout configuration to a simpler logic by @Groverkss in https://github.com/iree-org/iree/pull/21883 * [StableHLO][CHLO]Refactor CHLO decompositions to follow upstream StableHLO by @LekkalaSravya3 in https://github.com/iree-org/iree/pull/21682 * Revert "[VectorDistribute] Refactor layout configuration to a simpler logic" by @Groverkss in https://github.com/iree-org/iree/pull/21887 * [docs] Clarify compiler coding standards by @kuhar in https://github.com/iree-org/iree/pull/21886 * Upgrade Preprocessing and Modules to free create functions. NFC. by @kuhar in https://github.com/iree-org/iree/pull/21877 * [Codegen] Upgrade Common, SPIRV, VMVX to free create functions. NFC. by @kuhar in https://github.com/iree-org/iree/pull/21879 * [Codegen] Upgrade LLVMCPU and LLVMGPU to free create functions. NFC. by @kuhar in https://github.com/iree-org/iree/pull/21880 * [Codegen] Upgrade Dialect and Interfaces to free create functions. NFC. by @kuhar in https://github.com/iree-org/iree/pull/21881 * Add gfx950 ukernel patterns by @sebvince in https://github.com/iree-org/iree/pull/21856 * Bump version to 3.8.0 after 3.7.0 release. by @sa-faizal in https://github.com/iree-org/iree/pull/21852 * [docs] Update the file config file for running ONNX operator tests on CPU. by @hanhanW in https://github.com/iree-org/iree/pull/21892 * Upgrade GlobalOpt, InputConversion, ExternalInterfacess to free create function. NFC. by @kuhar in https://github.com/iree-org/iree/pull/21878 * [Codegen] Upgrade Transforms and Utils to free create functions. NFC. by @kuhar in https://github.com/iree-org/iree/pull/21882 * [ROCM] Update Ukernel infra to handle InnerTiledOp/Multi_MMA_MFMA by @Abhishek-Varma in https://github.com/iree-org/iree/pull/21759 * Reland "[VectorDistribute] Refactor layout configuration to a simpler logic" by @Groverkss in https://github.com/iree-org/iree/pull/21895 * Upgrade IREE plugins to free create functions. NFC. by @kuhar in https://github.com/iree-org/iree/pull/21896 * [GPU] Remove MMAScheduleAttr by @Groverkss in https://github.com/iree-org/iree/pull/21884 * [LLVMCPU] Respect dominance when doing replacement of tile and fused values by @MaheshRavishankar in https://github.com/iree-org/iree/pull/21901 * [Codegen] Upgrade iree dialects to free create functions. NFC. by @kuhar in https://github.com/iree-org/iree/pull/21898 * Integrate LLVM at [llvm-project/llvm@daf8f9](https://github.com/llvm-project/llvm/commit/daf8f9fc1ccc6c5679bc89058fd66d8ea4da9d59) by @rkayaith in https://github.com/iree-org/iree/pull/21893 * Upgrade all remaining code to free create functions. NFC. by @kuhar in https://github.com/iree-org/iree/pull/21902 * [LLVMGPU] Move LLVMGPUVectorLowering after OptimizeIntArithmetic by @Max191 in https://github.com/iree-org/iree/pull/21597 * [Codegen] Promote scales to LDS by @Muzammiluddin-Syed-ECE in https://github.com/iree-org/iree/pull/21767 * Bump the github-actions group with 2 updates by @dependabot[bot] in https://github.com/iree-org/iree/pull/21897 * Integrate llvm/llvm-project@31bee3421ba4 by @rkayaith in https://github.com/iree-org/iree/pull/21905 * [CPU] Tile all the ops to target vector sizes before vectorization. by @hanhanW in https://github.com/iree-org/iree/pull/21900 * [LinalgExt] Fold subview ops into map_scatter output before decomposing by @Max191 in https://github.com/iree-org/iree/pull/21891 * [GPU] Do not do c promotion for unaligned (I)GEMMs by @nirvedhmeshram in https://github.com/iree-org/iree/pull/21823 * [Codegen][ROCm] Add repro instructions for .rocmasm files by @kuhar in https://github.com/iree-org/iree/pull/21874 * [LinalgExt] Fix `FoldWithProducerReshapeByExpansion` for >1 dyn dim by @IanWood1 in https://github.com/iree-org/iree/pull/21894 * At the beginning of emulate narrow type, flatten incoming memrefs by @lialan in https://github.com/iree-org/iree/pull/21910 * Revert "Disable failing ARM-SME tests. (#21715)" by @banach-space in https://github.com/iree-org/iree/pull/21860 * [Codegen][AMDGPU] Drop backend reverts, emergency RDNA4 lowering fix by @krzysz00 in https://github.com/iree-org/iree/pull/21906 * [codegen] more consumer fusion by @jtuyls in https://github.com/iree-org/iree/pull/21848 * [CPU][DT] Add codegen support for broadcast/dequant -> matmul dispatch. by @hanhanW in https://github.com/iree-org/iree/pull/21911 * [Codegen][IGEMM] Set convolution pre-padding as default by @yzhang93 in https://github.com/iree-org/iree/pull/21899 * [Codegen][GenericVectorization] Fix incorrect usage of std::accumulation that led to overflow by @mshockwave in https://github.com/iree-org/iree/pull/21920 * Integrate llvm/llvm-project@e92cbfbe3087 by @rkayaith in https://github.com/iree-org/iree/pull/21917 * [Codegen][Cleanup] Always enable vectorization for padding and gather. by @hanhanW in https://github.com/iree-org/iree/pull/21924 * [Test] Disable AMDGPU onnx_ops test suite (O0) job. by @hanhanW in https://github.com/iree-org/iree/pull/21929 * Integrate llvm/torch-mlir@7000187b by @rkayaith in https://github.com/iree-org/iree/pull/21918 * Bump nanobind version by @Hardcode84 in https://github.com/iree-org/iree/pull/21926 * [iree][codegen] Add `#iree_codegen.denormal_fp_math` to set denormals behavior by @fabianmcg in https://github.com/iree-org/iree/pull/21840 * [ROCM] Add back specialization pattern tests by @jtuyls in https://github.com/iree-org/iree/pull/21939 * Fix `--iree-hip-target` validation by @bjacob in https://github.com/iree-org/iree/pull/21909 * Integrate LLVM at llvm/llvm-project@b22f94dcc58e by @rkayaith in https://github.com/iree-org/iree/pull/21943 * [GPU] Propagate extract slice when doing convolution padding by @nirvedhmeshram in https://github.com/iree-org/iree/pull/21948 * [CPU] Adjust tile sizes for mmt4d dispatches that have relayout ops. by @hanhanW in https://github.com/iree-org/iree/pull/21934 * Fixing CSE of hoisted encoding ops. by @benvanik in https://github.com/iree-org/iree/pull/21921 * Adding util.list.construct pseudo-op. by @benvanik in https://github.com/iree-org/iree/pull/21950 * [Dispatch Creation] Don't fuse no input producer with reduction by @IanWood1 in https://github.com/iree-org/iree/pull/21930 * Revert "[LinalgExt] Fix `FoldWithProducerReshapeByExpansion` for >1 dyn dim" by @IanWood1 in https://github.com/iree-org/iree/pull/21947 * [DispatchCreation]: Add FormSplitReductionDispatchesPass support for ArgCompare op by @bangtianliu in https://github.com/iree-org/iree/pull/21903 * [CPU] Add an experimental flag to disable linalg.conv generalization. by @hanhanW in https://github.com/iree-org/iree/pull/21953 * [GPU][DT] Add pingpong ukernels for data tiling (f8 and f16) by @Yu-Zhewen in https://github.com/iree-org/iree/pull/21919 * [ROCM][DT] Add encoding specialization infra for data-tiled ukernels by @jtuyls in https://github.com/iree-org/iree/pull/21914 * [GPU] Use UkernelDescriptor and deprecate UkernelConfigAttr and GPULowerToUkernelsPass by @Abhishek-Varma in https://github.com/iree-org/iree/pull/21766 * [docs] Update docs on sdxl golden output by @efric in https://github.com/iree-org/iree/pull/21936 * Fix Dispatch Creation TransformOptions by @IanWood1 in https://github.com/iree-org/iree/pull/21964 * Integrate LLVM at llvm/llvm-project@ed1f1b8 by @rkayaith in https://github.com/iree-org/iree/pull/21963 * [docs] Add a blog post for data-tiling introduction. by @hanhanW in https://github.com/iree-org/iree/pull/21774 * Avoid needless isa checks. NFC. by @bangtianliu in https://github.com/iree-org/iree/pull/21968 * `using Base::Base` in tablegen passes. by @benvanik in https://github.com/iree-org/iree/pull/21969 * [iree][codegen] Set `#iree_codegen.denormal_fp_math` in attention dispatches by @fabianmcg in https://github.com/iree-org/iree/pull/21940 * [compiler][NFC] Update remaining code to free create functions. by @hanhanW in https://github.com/iree-org/iree/pull/21972 * [plugins][NFC] Upgrade plugins/ to free create functions. by @hanhanW in https://github.com/iree-org/iree/pull/21973 * [GPU][DT] Update data layout strategy for pingpong ukernels by @Yu-Zhewen in https://github.com/iree-org/iree/pull/21957 * Using explicit operation types in passes. by @benvanik in https://github.com/iree-org/iree/pull/21971 * Converting compiler/Bindings/ to tablegen Passes.td. by @benvanik in https://github.com/iree-org/iree/pull/21974 * [Codegen] Unroll instead of linearize vector.to_elements. by @amd-eochoalo in https://github.com/iree-org/iree/pull/21959 * [Codegen] Added erf ; FastMath rewrite for vector types. by @keshavvinayak01 in https://github.com/iree-org/iree/pull/21849 * Adding hal.executable `lazy` flag. by @benvanik in https://github.com/iree-org/iree/pull/21966 * Don't inline immutable globals with non-util dialect attrs. by @benvanik in https://github.com/iree-org/iree/pull/21986 * [Codegen][RISCV] Do not lower vector.gather to branches in the presence of RVV by @mshockwave in https://github.com/iree-org/iree/pull/21927 * [GPU] Only combine complex relayout chains in GPUCombineLayoutTransformation by @Max191 in https://github.com/iree-org/iree/pull/21985 * [LLVMGPU] Move masked load optimizations after vector lowering by @Max191 in https://github.com/iree-org/iree/pull/21962 * [iree-test-suites] Update golden benchmark numbers by @Max191 in https://github.com/iree-org/iree/pull/21980 * [Encoding] Deprecate MatmulKAttr encoding attribute. by @hanhanW in https://github.com/iree-org/iree/pull/21976 * [Codegen] Make collapse_shape hoisting pattern work with store_to_buffer by @Max191 in https://github.com/iree-org/iree/pull/21999 * [LinalgExt] Add canonicalization to convert identity map_scatter to copy by @Max191 in https://github.com/iree-org/iree/pull/21998 * [GPU][DT] Add benchmark files for llama_8b_f16 with data-tiling. by @hanhanW in https://github.com/iree-org/iree/pull/21975 * [Encoding] set default option for scaled matmul encodings to false by @Muzammiluddin-Syed-ECE in https://github.com/iree-org/iree/pull/21994 * Marking `stream.async.dispatch` as pure. by @benvanik in https://github.com/iree-org/iree/pull/21989 * [Dispatch Creation] Allow fusing pad with split reduction dispatch by @IanWood1 in https://github.com/iree-org/iree/pull/21987 * Fix data race in GPU C ukernels caching of shared memory size by @bjacob in https://github.com/iree-org/iree/pull/22004 * [LinalgExt][NFC] Remove unused code in TransposeFusion by @IanWood1 in https://github.com/iree-org/iree/pull/22006 * [Dispatch Creation] Rework dispatch formation logic by @IanWood1 in https://github.com/iree-org/iree/pull/21854 * [TensorExt] Fix dynamic dim canonicalization in bitcast folder by @jtuyls in https://github.com/iree-org/iree/pull/21997 * [CPU] Switch matmul config to use linalg::LinalgOp interface. by @hanhanW in https://github.com/iree-org/iree/pull/21954 * Bump the github-actions group with 2 updates by @dependabot[bot] in https://github.com/iree-org/iree/pull/21992 * Integrate LLVM at llvm/llvm-project@0648c5183f32 by @qedawkins in https://github.com/iree-org/iree/pull/22003 * [VectorDistribute] Use subgroup_basis instead of subgroup_m/n_count by @Groverkss in https://github.com/iree-org/iree/pull/21912 * Splitting hoisted async constant lifetime. by @benvanik in https://github.com/iree-org/iree/pull/21995 * Adding iree_hal_executable_export_info_t and queries. by @benvanik in https://github.com/iree-org/iree/pull/21754 * Respect user `FILECHECK_OPTS`/`LIT_OPTS` environment variables when running through ctest by @rkayaith in https://github.com/iree-org/iree/pull/22019 * [PassUtils] Allow passing overload constructors to `addPredicatedPass` by @rkayaith in https://github.com/iree-org/iree/pull/22021 * [NFC] remove unused header files by @bangtianliu in https://github.com/iree-org/iree/pull/21977 * [Codegen][GPU] Enable TileAndFuse for matmul by default by @jerryyin in https://github.com/iree-org/iree/pull/21834 * [CPU] Populate to_elements unrolling patterns in LLVM conversion. by @hanhanW in https://github.com/iree-org/iree/pull/22010 * Fix mi308 Pkgci failures by @IanWood1 in https://github.com/iree-org/iree/pull/22028 * [Dispatch Creation] Fuse bcast with attention instead of producer by @IanWood1 in https://github.com/iree-org/iree/pull/22008 * [CPU] Add precondition to kernel dispatch method selection for gemm. by @hanhanW in https://github.com/iree-org/iree/pull/22031 * [Codegen][Tuner] update lowering config binding for subgroup basis by @bangtianliu in https://github.com/iree-org/iree/pull/22027 * Fix indices in scaled matmul rank assert by @jtuyls in https://github.com/iree-org/iree/pull/22016 * [GPU] Introduce Intentional Padded Configurations for (I)GEMM by @nirvedhmeshram in https://github.com/iree-org/iree/pull/21931 * [CI] Disabling WebGPU build due to CI failures. by @MaheshRavishankar in https://github.com/iree-org/iree/pull/22030 * [DispatchCreation] Add option to infer split-reduction sizes for batchnorm by @rkayaith in https://github.com/iree-org/iree/pull/21731 * Implement `iree_gpu.coalesced_gather_dma` op by @lialan in https://github.com/iree-org/iree/pull/21846 * [LinalgExt] Support map_scatter decomposition with strided memrefs by @Max191 in https://github.com/iree-org/iree/pull/21952 * [Codegen] Tile map_scatter op for large vector sizes by @Max191 in https://github.com/iree-org/iree/pull/22035 * [DispatchCreation] Fix `iree-compile` split-reduction flag name by @rkayaith in https://github.com/iree-org/iree/pull/22038 * Integrate LLVM at llvm/llvm-project@dffd7f3d9a3 by @qedawkins in https://github.com/iree-org/iree/pull/22023 * [NFC][ROCM] Refactor bitcode ukernel to a separate file by @Abhishek-Varma in https://github.com/iree-org/iree/pull/21983 * [VectorDistribute] Allow distributing subgroups on multiple m dimensions by @Groverkss in https://github.com/iree-org/iree/pull/22000 * [LLVMGPU] Vectorize map_scatter in LLVMGPUTileAndFuse pipeline by @Max191 in https://github.com/iree-org/iree/pull/21890 * [Codegen] Push up memref reshapes before empty tensor elimination by @Max191 in https://github.com/iree-org/iree/pull/22045 * [LLVMGPU] Add support for direct convolution in tile and fuse pipeline by @yzhang93 in https://github.com/iree-org/iree/pull/22033 * LLVM-Integrate: Drop revert for f645d209d by @qedawkins in https://github.com/iree-org/iree/pull/22044 * Integrate llvm/llvm-project@50ef746a12 by @qedawkins in https://github.com/iree-org/iree/pull/22046 * [Encoding] Propagate layout encodings through tensor.cast ops by @Max191 in https://github.com/iree-org/iree/pull/21970 * [python] Expose python bindings for nvvm in iree.compiler.dialects by @saladpalad in https://github.com/iree-org/iree/pull/21993 * Revert "[Dispatch Creation] Rework dispatch formation logic (#21854)" by @IanWood1 in https://github.com/iree-org/iree/pull/22058 * [Codegen] Add transform ops for matching contraction ops by @bangtianliu in https://github.com/iree-org/iree/pull/21981 * Integrate llvm/llvm-project@1ee18959bcdf by @efric in https://github.com/iree-org/iree/pull/22062 * Disable data-tiling flag by default and refresh the CPU docs. by @hanhanW in https://github.com/iree-org/iree/pull/21935 * [DispatchCreation] Propagate and fuse encodings in default path by @Max191 in https://github.com/iree-org/iree/pull/22063 * [LinalgExt][NFC] Remove unused VectorOps include by @hanhanW in https://github.com/iree-org/iree/pull/22066 * Integrate llvm/llvm-project@9d48df7a92e7 by @efric in https://github.com/iree-org/iree/pull/22064 * [LLVMGPU] Enable iree-llvmgpu-test-combine-layout-transformation by default by @Max191 in https://github.com/iree-org/iree/pull/21979 * Adding interface support for stream.async.transfer result placement. by @benvanik in https://github.com/iree-org/iree/pull/22048 * Marking stream.tensor.dispatch pure. by @benvanik in https://github.com/iree-org/iree/pull/22075 * [GPU][DT] Add data-tiling resolver by default. by @hanhanW in https://github.com/iree-org/iree/pull/22074 * [Util] Allow varying types in optimization barrier by @qedawkins in https://github.com/iree-org/iree/pull/22076 * [ROCm] Add an experimental target for gfx1250 by @kuhar in https://github.com/iree-org/iree/pull/22077 * Remove e2e matmul tests with explicit compilation-info by @bjacob in https://github.com/iree-org/iree/pull/22085 * [NFC] Improving consistency of Util/Transforms/Passes.h. by @benvanik in https://github.com/iree-org/iree/pull/22078 * e2e matmul tests covering vector-distribution by @bjacob in https://github.com/iree-org/iree/pull/22086 * Reapply "[LinalgExt] Fix `FoldWithProducerReshapeByExpansion` for >1 … by @IanWood1 in https://github.com/iree-org/iree/pull/22088 * [Test] Trim data-tiling compile flags from tests. by @hanhanW in https://github.com/iree-org/iree/pull/22092 * [Codegen] Support scaled matmul in ConvertAccGEMMToGEMM by @Max191 in https://github.com/iree-org/iree/pull/22093 * [GPU] Fix bug in shared memory computation for scaled intrinsics by @Max191 in https://github.com/iree-org/iree/pull/22095 * Integrate llvm/llvm-project@876296e9b7f0 by @efric in https://github.com/iree-org/iree/pull/22097 * [Codegen] Add transform op for matching dimension sizes. by @bangtianliu in https://github.com/iree-org/iree/pull/22040 * Revert e2e matmul tests changes by @bjacob in https://github.com/iree-org/iree/pull/22111 * [mlir][amdgpu] Replaced `nullopt` with target arch chipset in `populateGpuPromoteShuffleToAMDGPUPatterns` pass by @xintin in https://github.com/iree-org/iree/pull/21799 * Display a warning when we spill SGPRs or VGPRs by @sebvince in https://github.com/iree-org/iree/pull/21863 * [Codegen][GPU] Fix MMA Intrinsics Sorting by @bangtianliu in https://github.com/iree-org/iree/pull/22090 * [Codegen][GPU][NFC] Fix mma sort follow up by @bangtianliu in https://github.com/iree-org/iree/pull/22122 * [DT] Add support for materializing func.func args with encodings. by @hanhanW in https://github.com/iree-org/iree/pull/22115 * Break `generate_e2e_matmul_test.py` into multiple files by @bjacob in https://github.com/iree-org/iree/pull/22120 * NFC: Simplify generation of e2e matmul test functions. by @bjacob in https://github.com/iree-org/iree/pull/22123 * [ROCm] Fix up gfx1250 definitions by @kuhar in https://github.com/iree-org/iree/pull/22131 * Clean up KnownTargets.cpp. NFC. by @kuhar in https://github.com/iree-org/iree/pull/22133 * Fix the Windows build: portably set environment variable PYTHONPATH. by @bjacob in https://github.com/iree-org/iree/pull/22136 * [Codegen][ROCm] Attempt to fix MMA sorting CI failures by @kuhar in https://github.com/iree-org/iree/pull/22141 * [NFC][GlobalOpt] Update function names in LIT by @AGindinson in https://github.com/iree-org/iree/pull/22083 * [VectorDistribute] Flush denormals for attention reduction config by @Groverkss in https://github.com/iree-org/iree/pull/22041 * Simplify op conversion pattern inheriting constructor definitions. NFC. by @kuhar in https://github.com/iree-org/iree/pull/22143 * Simplify op rewrite pattern inheriting constructor definitions. NFC. by @kuhar in https://github.com/iree-org/iree/pull/22142 * [LLVMGPU] Don't use DMA for scaled matmul by @Max191 in https://github.com/iree-org/iree/pull/22094 * [Codegen] Add bufferization support for new `iree_gpu.coalesced_gather_dma` op by @lialan in https://github.com/iree-org/iree/pull/22049 * [GPU] Support iree_codegen.load_from_buffer in GPUBubbleResourceCasts by @Max191 in https://github.com/iree-org/iree/pull/22140 * [Preprocessing] Add pass to sink transpose through pad by @IanWood1 in https://github.com/iree-org/iree/pull/22106 * [LLVMGPU] Unroll elementwise operations by @Groverkss in https://github.com/iree-org/iree/pull/21665 * Bump actions/cache from 4.2.4 to 4.3.0 in the github-actions group by @dependabot[bot] in https://github.com/iree-org/iree/pull/22152 * Increase golden time. by @amd-eochoalo in https://github.com/iree-org/iree/pull/22159 * [Codegen][AMDGPU] Fix incorrect canonical map for MXFP RHS scales by @krzysz00 in https://github.com/iree-org/iree/pull/22162 * [Preprocessing] Transpose conv filter layout from CHWF to FHWC by @yzhang93 in https://github.com/iree-org/iree/pull/22100 * Integrate llvm/llvm-project@7af31bf by @amd-eochoalo in https://github.com/iree-org/iree/pull/22148 * [LLVMGPU][Codegen] Increase parallel rows read for matvec by @efric in https://github.com/iree-org/iree/pull/22163 * [Codegen] support matching any values for dims_equal transform op by @bangtianliu in https://github.com/iree-org/iree/pull/22149 * Integrate llvm/llvm-project@a33544b by @amd-eochoalo in https://github.com/iree-org/iree/pull/22167 * E2E MXFP4 matmul tests by @bjacob in https://github.com/iree-org/iree/pull/22170 * [Test][NFC] Drop `input_type` from e2e tests because IREE can infer the input type. by @hanhanW in https://github.com/iree-org/iree/pull/22014 * [DispatchCreation] infer split-reduction sizes for ArgCompare by @bangtianliu in https://github.com/iree-org/iree/pull/22154 * [Codegen][AMDGPU] Tile and convert gather to coalesced DMA by @lialan in https://github.com/iree-org/iree/pull/22157 * Revert "[LLVMGPU] Unroll elementwise operations (#21665)" by @MaheshRavishankar in https://github.com/iree-org/iree/pull/22186 * Integrate llvm/llvm-project@0cb9d40 by @amd-eochoalo in https://github.com/iree-org/iree/pull/22182 * Port e2e matmul tests from gfx942 to gfx950 by @bjacob in https://github.com/iree-org/iree/pull/22191 * [E2E-Matmul] Remove redundant flag from scaled matmul e2e test by @Max191 in https://github.com/iree-org/iree/pull/22190 * [Codegen][AMDGPU] Enable gpu.printf patterns by @krzysz00 in https://github.com/iree-org/iree/pull/22192 * [GPU] Add thread tile size inference for map_scatter op by @Abhishek-Varma in https://github.com/iree-org/iree/pull/22179 * [DT][ROCM] Fix inner_tiled bitcode ukernel lowering with instrinsicsM(N) = 1 by @Yu-Zhewen in https://github.com/iree-org/iree/pull/22184 * [Codegen] Add `ResolveShapedTypeResultDimsPass` pass to GPU vector distribute by @fabianmcg in https://github.com/iree-org/iree/pull/22196 * [LinalgExt] Introduce linalg_ext.exp_reduction by @hhkit in https://github.com/iree-org/iree/pull/21761 * Integrate llvm/llvm-project@4845b3e by @amd-eochoalo in https://github.com/iree-org/iree/pull/22200 * [GPU] Allow multi result and indexing compute generic ops in TilleAndFuse pipeline by @nirvedhmeshram in https://github.com/iree-org/iree/pull/22195 * [Codegen][GPU] Fix IGEMM pre-padding and fusion patterns by @yzhang93 in https://github.com/iree-org/iree/pull/22197 * [DataTiling] Introduce DataTiledMMAInterfaceAttr by @Max191 in https://github.com/iree-org/iree/pull/22098 * [Codegen] Follow-up Fix for MatchContractionOp by @bangtianliu in https://github.com/iree-org/iree/pull/22201 * Removing virtual MMAs from vector distribute matmul/conv pipeline by @jerryyin in https://github.com/iree-org/iree/pull/22202 * [Codegen] Add transform op for matching convolution ops by @bangtianliu in https://github.com/iree-org/iree/pull/22194 * Revert "[GPU] Allow multi result and indexing compute generic ops in TilleAndFuse pipeline" by @IanWood1 in https://github.com/iree-org/iree/pull/22205 * [Codegen] Fix premature return in iree_codegen.inner_tiled verifier by @Max191 in https://github.com/iree-org/iree/pull/22183 * [Codegen][LLVMGPU] Later scf-to-cf to support math.erf by @newling in https://github.com/iree-org/iree/pull/21817 * [CI] Add dummy torch pkgci by @Groverkss in https://github.com/iree-org/iree/pull/22203 * [DataTiling][GPU] Introduce DataTiledScaledMMAAttr by @Max191 in https://github.com/iree-org/iree/pull/22176 * [Preprocessing] Reorder the iterator dims to match forward NHWC-FHWC convs by @yzhang93 in https://github.com/iree-org/iree/pull/22208 * Decrease llama 8b_f16_decode golden time by @efric in https://github.com/iree-org/iree/pull/22220 * [DataTiling][GPU] Implement scaled matmul data tiling materialization by @Max191 in https://github.com/iree-org/iree/pull/22189 * Integrate llvm/llvm-project@95e0ae9f by @newling in https://github.com/iree-org/iree/pull/22214 * Control const expr hoisting in Dispatch Creation by @IanWood1 in https://github.com/iree-org/iree/pull/22164 * [codegen] Fix test after PR 22196 by @fabianmcg in https://github.com/iree-org/iree/pull/22218 * [codegen][gpu] GPUApplyPaddingLevel: fold case where no padding by @newling in https://github.com/iree-org/iree/pull/22193 * [GlobalOpt] Use Option<> for TransformOptions by @IanWood1 in https://github.com/iree-org/iree/pull/22222 * [ROCm] Enable e2e stablehlo tests by @kuhar in https://github.com/iree-org/iree/pull/22224 * Adding a LiftCFGToSCFPass. by @benvanik in https://github.com/iree-org/iree/pull/22101 * Improving support for unreachable control flow in both CFG and SCF. by @benvanik in https://github.com/iree-org/iree/pull/22102 * Adding VerifyStructuredControlFlowPass. by @benvanik in https://github.com/iree-org/iree/pull/22110 * Fix g++ warning -Werror=parentheses by @IanWood1 in https://github.com/iree-org/iree/pull/22225 * Update CODEOWNERS to include new tests and dialect owners by @Groverkss in https://github.com/iree-org/iree/pull/22213 * [CI] Add clip and llama torch_models tests by @Groverkss in https://github.com/iree-org/iree/pull/22212 * [samples] Update PyTorch JIT notebook for Python 3.12 by @HeatCrab in https://github.com/iree-org/iree/pull/22209 * [Codegen] add transform op for matching attention op by @bangtianliu in https://github.com/iree-org/iree/pull/22199 * Fix linking MSVC error from forward declaration used as templated type by @Max191 in https://github.com/iree-org/iree/pull/22233 * [PkgCI] Use urllib instead of github cli in pkgci artifact_run by @Groverkss in https://github.com/iree-org/iree/pull/22211 * Transposed Workgroup Reordering for large rectangular matmuls by @sebvince in https://github.com/iree-org/iree/pull/22165 * Fix typo ConditionalTranspose attribute description by @sebvince in https://github.com/iree-org/iree/pull/22238 * [docs] Add LLVM debugging and some AMDGPU-specific tips by @krzysz00 in https://github.com/iree-org/iree/pull/22146 * Integrate llvm/llvm-project@7546bd3 by @newling in https://github.com/iree-org/iree/pull/22234 * [CI][iree-test-suites] Add random weight 8b_fp8 and 8b_fp16 benchmarks by @Groverkss in https://github.com/iree-org/iree/pull/22239 * Integrate llvm/llvm-project@327a89c by @newling in https://github.com/iree-org/iree/pull/22255 * [CI][iree-test-suites] Upload json summary for torch_models CI by @Groverkss in https://github.com/iree-org/iree/pull/22253 * [CI][iree-test-suites] Update ref for iree-test-suites by @Groverkss in https://github.com/iree-org/iree/pull/22263 * [build flags] prepare to enable more warnings in compile flags (#21996) by @schuermans-roofline in https://github.com/iree-org/iree/pull/22252 * [Codegen] Update the assembly formats and corresponding tests for matcher ops by @bangtianliu in https://github.com/iree-org/iree/pull/22270
New Contributors
- @LekkalaSravya3 made their first contribution in https://github.com/iree-org/iree/pull/21682
- @saladpalad made their first contribution in https://github.com/iree-org/iree/pull/21993
- @xintin made their first contribution in https://github.com/iree-org/iree/pull/21799
- @hhkit made their first contribution in https://github.com/iree-org/iree/pull/21761
- @HeatCrab made their first contribution in https://github.com/iree-org/iree/pull/22209
- @schuermans-roofline made their first contribution in https://github.com/iree-org/iree/pull/22252
Full Changelog: https://github.com/iree-org/iree/compare/v3.7.0...v3.8.0