Highlights in IREE v3.7 Release
1. Compiler
1.1 FP4 and BFLOAT Support on CPUs:
- Enabled software-based FP4 code execution on CPU backends, including scaling_extf and scaling_truncf conversions. (https://github.com/iree-org/iree/pull/21413)
- Moved bfloat expansion patterns earlier to avoid conflicts with arith-to-llvm conversion. (https://github.com/iree-org/iree/pull/21413)
1.2 Dispatch and Fusion Improvements:
- Fused bit-truncate ops with producers during dispatch creation to improve fusion consistency. (https://github.com/iree-org/iree/pull/21346)
- Fused reshape op chains with set_encoding ops to increase producer fusion opportunities. (https://github.com/iree-org/iree/pull/21365)
- Allowed dynamic quantized kernels to fuse into a single dispatch by relaxing consumer fusion restrictions. (https://github.com/iree-org/iree/pull/21492)
- Expanded encoding op fusion support to include reductions and attention ops. (https://github.com/iree-org/iree/pull/21612)
- Enabled more aggressive consumer fusion, improving layer normalization fusion and overall fusion effectiveness. (https://github.com/iree-org/iree/pull/21521)
1.3 GPU Codegen and Optimization:
- Added fallback patterns for fp4 and f8E8M0FNU conversions in LLVMGPU backend. (https://github.com/iree-org/iree/pull/21453)
- Improved attention configuration by sorting intrinsic pairs and enabling intrinsic reuse, optimizing matmul layouts. (https://github.com/iree-org/iree/pull/21448)
- Added support for resolving swizzling hints with GatherToLDSOp on AMDGPU. (https://github.com/iree-org/iree/pull/21478)
- Introduced heuristic strategy to reduce tile size for better CU workload distribution on GPUs (notably MI300x/MI308x). (https://github.com/iree-org/iree/pull/21546)
- Considered operand bitwidth when choosing thread-level vector sizes for reduction vector distribution. This improves performance on matvec kernels by up to 20% on MI300X. (https://github.com/iree-org/iree/pull/21438)
- Distributed reductions to a single subgroup on large parallel dimensions, improving performance of matvec kernels by ~3-4% on MI355X. (https://github.com/iree-org/iree/pull/21499)
- Enabled tiling of fully dynamic root ops to GPU subgroup sizes to maximize parallelism utilization. (https://github.com/iree-org/iree/pull/21526)
- Set fast math flags to enable FMA fusion in LLVMGPU backend for arithmetic ops. (https://github.com/iree-org/iree/pull/21528)
- Added XOR-based swizzle attribute for memory layout optimizations. (https://github.com/iree-org/iree/pull/21562)
- Added pattern to sink extract_slice ops through generic ops to improve fusion. (https://github.com/iree-org/iree/pull/21796)
- Used arithmetic intensity metrics (peak TFLOPs, memory bandwidth) to guide GEMM size categorization and seed selection in heuristic. (https://github.com/iree-org/iree/pull/21638)
- Tiled fully dynamic root ops to subgroup sizes to maximize GPU parallelism. (https://github.com/iree-org/iree/pull/21526)
1.4 Convolution and IGEMM Enhancements:
- Added convolution padding before IGEMM conversion to improve vectorization and contiguous memory access. (https://github.com/iree-org/iree/pull/21470)
- Fixed pre-padding for group convolutions in IGEMM path by treating depth as batch dimension. (https://github.com/iree-org/iree/pull/21583)
- Improved transpose fusion for conv operations, allowing conv+transpose fusion into a single dispatch, beneficial for PyTorch input pipelines. (https://github.com/iree-org/iree/pull/21778)
1.5 Data Tiling and Materialization Updates:
- Supported multi-result and interchanged generic materialization for linalg.generic ops. (https://github.com/iree-org/iree/pull/21416)
- Enabled scalable tile encoding materialization for mmt4d in data tiling (partial end-to-end support). (https://github.com/iree-org/iree/pull/21304)
- Enabled scalable tile encoding materialization for mmt4d (partial end-to-end). (https://github.com/iree-org/iree/pull/21304)
- Graduated data-tiling fusion from experimental flag to binding option, improving stability and control. (https://github.com/iree-org/iree/pull/21745)
- Improved fusion of encoding ops for multi-use, gather, and slice ops, reducing dispatch count and latency in models like llama fp8. (https://github.com/iree-org/iree/pull/21830)
- Implemented CPU data layout propagation optimizations for dispatches. (https://github.com/iree-org/iree/pull/21554)
- Supported partially enabling data-tiling via attribute hints. (https://github.com/iree-org/iree/pull/21676)
1.6 Reduction and Vectorization Improvements:
- Added support for split-reduction tiling of multiple reduction dimensions. (https://github.com/iree-org/iree/pull/21474)
- Enabled CPU tile reduction dimensions for non-root reduction ops to avoid oversized vector sizes in softmax dispatches. (https://github.com/iree-org/iree/pull/21500)
- Added vector distribution pattern for iree_linalg_ext.map_scatter to support efficient vectorization. (https://github.com/iree-org/iree/pull/21124)
1.7 Codegen and Canonicalization:
- Added IREE-specific canonicalizer pass to fold full subviews using shape-aware interfaces. (https://github.com/iree-org/iree/pull/21456)
- Folded bitcasts of inner tensor dimensions into binding.subspan to eliminate extra bufferization. (https://github.com/iree-org/iree/pull/21443)
- Fixed dominance issues to unblock consumer fusion by repositioning consumers before fusion. (https://github.com/iree-org/iree/pull/21551)
- Improved early bufferized padding codegen to avoid inefficient iteration over non-padded tensor regions. (https://github.com/iree-org/iree/pull/21694)
1.8 CPU Pipeline and Lowering Config:
- Refreshed CPU pipeline verification to rely on LoweringConfigAttr, removing TilingConfig dependency. (https://github.com/iree-org/iree/pull/21541)
- Dropped empty tile sizes from lowering configs to simplify IR and remove redundant info. (https://github.com/iree-org/iree/pull/21542)
- Improved and renamed TileAndFuseProducerConsumer pass with enhanced anchoring options; deprecated older passes accordingly. (https://github.com/iree-org/iree/pull/21674)
1.9 Python Bindings & Tuner:
- Exposed IREE’s VirtualMMAIntrinsicAttr and VirtualMMAAttr to Python, enabling the tuner to enumerate all MMA options, including virtual ones. (https://github.com/iree-org/iree/pull/21403)
- Added Python binding for MMA single subgroup layout inference. (https://github.com/iree-org/iree/pull/21454)
2. Runtime
- Introducted splat parameter generation for stream.named.parameters, changing the compiler interface; downstream maintainers should review and adapt their projects accordingly. (https://github.com/iree-org/iree/pull/21684)
- Added iree_hal_device_queue_dispatch for more efficient single dispatch execution without command buffer overhead. (https://github.com/iree-org/iree/pull/21630)
- Unified dispatch and indirect dispatch APIs with extended configuration support. (https://github.com/iree-org/iree/pull/21627)
- Added iree_hal_executable_export_info_t and queries: the CPU-side HAL executable format is changed and will require recompilation of artifacts (compiler and runtime must be in sync). Version errors indicate the compiler or runtime is out of sync. (https://github.com/iree-org/iree/pull/21754)
- Added iree_hal_executable_cache_infer_format: the GPU-side HAL executable formats are changed and will require recompilation of artifacts. (https://github.com/iree-org/iree/pull/21763)
Note: There is an known issue that IREE may miscompile some matmul dispatches on RDNA4
Change Log
Git History
* Fix string parsing of i8 and i16 cl values by @qedawkins in https://github.com/iree-org/iree/pull/21409 * [Codegen] Rename `thread_basis` to `lane_basis`. NFC. by @kuhar in https://github.com/iree-org/iree/pull/21412 * [Codegen] Don't add map_scatter for only reshapes by @Max191 in https://github.com/iree-org/iree/pull/21414 * [Encoding] Remove ambiguity from encoding propagation interface methods by @Max191 in https://github.com/iree-org/iree/pull/21415 * [Dispatch Creation] Fuse bit-truncate ops with producers by @IanWood1 in https://github.com/iree-org/iree/pull/21346 * [LLVMCPU] Populate fp4 expansion patterns on CPUs by @krzysz00 in https://github.com/iree-org/iree/pull/21413 * [Codegen] Materialize 0D set_encoding into no-op by @Max191 in https://github.com/iree-org/iree/pull/21418 * [Codegen][Tuner] add python binding for VirtualMMAIntrinsic by @bangtianliu in https://github.com/iree-org/iree/pull/21403 * Integrate LLVM to llvm/llvm-project@5f53182 by @bangtianliu in https://github.com/iree-org/iree/pull/21408 * [CPU] Propagate cache tiling sizes in lowering config propagation. by @hanhanW in https://github.com/iree-org/iree/pull/21410 * [DispatchCreation] Fuse reshape op chains along with set_encoding ops by @Max191 in https://github.com/iree-org/iree/pull/21365 * Integrate LLVM at 92c55a3 by @bjacob in https://github.com/iree-org/iree/pull/21429 * [DT][SVE] DT support for scalable tiles - encoding materialization for mmt4d by @egebeysel in https://github.com/iree-org/iree/pull/21304 * [CPU] Use lowering config attribute interface in LLVMCPUTileAndFuse. by @hanhanW in https://github.com/iree-org/iree/pull/21405 * Simplify the resolution of `scf.forall` created by split reductions. by @MaheshRavishankar in https://github.com/iree-org/iree/pull/21422 * [Codegen] Support multi-result and interchanged generic materialization by @Max191 in https://github.com/iree-org/iree/pull/21416 * Removing iree/base/internal/file_io.h by migrating to file handle. by @benvanik in https://github.com/iree-org/iree/pull/21411 * [Codegen] Collect slices to fuse producers of loop destinations into lane foralls. by @YashDeshpande25 in https://github.com/iree-org/iree/pull/21432 * [TensorExt] Add inliner interface by @qedawkins in https://github.com/iree-org/iree/pull/21437 * [codegen] use vector.broadcast instead of vector.splat by @newling in https://github.com/iree-org/iree/pull/21435 * [Dispatch Creation] Don't place bit-truncate in consumer dispatch by @IanWood1 in https://github.com/iree-org/iree/pull/21379 * [Codegen] Fold bitcastss of inner dimensions into binding.subspan by @krzysz00 in https://github.com/iree-org/iree/pull/21443 * [NFC] removing debug statement by @Muzammiluddin-Syed-ECE in https://github.com/iree-org/iree/pull/21446 * [CPU][NFC] Update pack ops to not carry artificial padding. by @hanhanW in https://github.com/iree-org/iree/pull/21440 * [NFC][LLVMGPU] Move intrinsic sorting to deduceMMASchedule by @Groverkss in https://github.com/iree-org/iree/pull/21447 * [Codegen] Add llvm_unreachable for unhandled WorkgroupId cases by @KyleHerndon in https://github.com/iree-org/iree/pull/21442 * Fix an issue in ReferencePatitioning. by @AWoloszyn in https://github.com/iree-org/iree/pull/21343 * Integrate LLVM at aa1b416 by @raikonenfnu in https://github.com/iree-org/iree/pull/21455 * Bump version to 3.7.0 after 3.6.0 release. by @sa-faizal in https://github.com/iree-org/iree/pull/21460 * [Codegen][Tuner]: expose python binding for mma single subgroup layout by @bangtianliu in https://github.com/iree-org/iree/pull/21454 * [Codegen] Add canonicalizer with IREE codegen specific patterns by @Max191 in https://github.com/iree-org/iree/pull/21456 * Update regression tests to not have artificial padding. by @hanhanW in https://github.com/iree-org/iree/pull/21436 * [NFC] Switch dynamic inputs to flow.tensor.dynamic_constant. by @hanhanW in https://github.com/iree-org/iree/pull/21461 * Integrate LLVM at [8fff23] by @bjacob in https://github.com/iree-org/iree/pull/21463 * [Codegen][LLVMGPU] Add fallback patterns for fp4/f8E8M0FNU handling by @krzysz00 in https://github.com/iree-org/iree/pull/21453 * [Flow] Add support for moving operations with dependencies into dispatch regions by @jtuyls in https://github.com/iree-org/iree/pull/21399 * [GPU] Sort intrinsic pairs for attention configuration by @Groverkss in https://github.com/iree-org/iree/pull/21448 * Integrate LLVM at 1c3e4e99 by @bjacob in https://github.com/iree-org/iree/pull/21476 * [LinalgExt] Fix reshape fusion crash by @IanWood1 in https://github.com/iree-org/iree/pull/21472 * [HAL][AMDGPU] Use doorbell handle in iree_amd_make_cached_queue by @atgutier in https://github.com/iree-org/iree/pull/21479 * [Codegen][AMDGPU] Resolve swizzling hints with GatherToLDSOp by @lialan in https://github.com/iree-org/iree/pull/21478 * [build] Add GFX ARCH type to bitcode file names by @atgutier in https://github.com/iree-org/iree/pull/21484 * [DT][NFC] Unified DEBUG_TYPE for encoding materialization implementations. by @hanhanW in https://github.com/iree-org/iree/pull/21480 * [Codegen] Add SwapExtractWithCollapsePattern by @yzhang93 in https://github.com/iree-org/iree/pull/21419 * [DT][VMVX][NFC] Rename and update VMVX encoding materialization tests. by @hanhanW in https://github.com/iree-org/iree/pull/21488 * [GPU][NFC] Delete unused legacy LLVMGPUTensorPad pass. by @hanhanW in https://github.com/iree-org/iree/pull/21489 * [LinalgExt] Add pattern to make attention more static by @IanWood1 in https://github.com/iree-org/iree/pull/21481 * [DispatchCreation] Add unset encoding through generic propagation by @jtuyls in https://github.com/iree-org/iree/pull/21426 * [Codegen][LLVMGPU] Use inner reduction lowering for multi_reduction by @kuhar in https://github.com/iree-org/iree/pull/21486 * [Codegen][LLVMGPU] Remove math scalarization patterns by @krzysz00 in https://github.com/iree-org/iree/pull/21490 * [LinalgExt] Adding lowering to inner_tiled ops for contraction like ops with scales by @Muzammiluddin-Syed-ECE in https://github.com/iree-org/iree/pull/21358 * Integrate LLVM at dc58a08 by @bjacob in https://github.com/iree-org/iree/pull/21494 * [CPU] Add CombineLayoutTransformation passes after distribution passes. by @hanhanW in https://github.com/iree-org/iree/pull/21444 * [HAL] Allow HAL dialect to store attributes in properties structs by @krzysz00 in https://github.com/iree-org/iree/pull/21485 * Add support for split-reduction-tiling of multiple reduction dimensions. by @MaheshRavishankar in https://github.com/iree-org/iree/pull/21474 * [NFC] Followup from https://github.com/iree-org/iree/pull/21474 by @MaheshRavishankar in https://github.com/iree-org/iree/pull/21498 * [Flow][NFC] Fix deprecation warnings for ArrayRef(std::nullopt). by @hanhanW in https://github.com/iree-org/iree/pull/21502 * Integrate LLVM at 9e09c4d by @bjacob in https://github.com/iree-org/iree/pull/21495 * [CPU] adjust CPUPrepareUKernelsPass to accept iree_cpu.lowering by @egebeysel in https://github.com/iree-org/iree/pull/21493 * [DispatchCreation] Ensure that the dynamic quantized kernel gets fused into a single dispatch by @MaheshRavishankar in https://github.com/iree-org/iree/pull/21492 * [CPU] Skip distribution passes if the tile sizes are known as zeros. by @hanhanW in https://github.com/iree-org/iree/pull/21508 * Integrate LLVM at llvm/llvm-project@1381ad497b9a. by @hanhanW in https://github.com/iree-org/iree/pull/21510 * [Codegen] Add infra for lowering MLIR ukernels based on descriptors by @jtuyls in https://github.com/iree-org/iree/pull/21428 * [TensorExt] Add folder for bitcast(tensor.cast) by @qedawkins in https://github.com/iree-org/iree/pull/21507 * [Codegen][Util] Remove TiedOpInterface implementation from IREE::Codegen::InnerTiledOp. by @hanhanW in https://github.com/iree-org/iree/pull/21517 * [Codegen] Remove to_buffer from bufferization deny list by @qedawkins in https://github.com/iree-org/iree/pull/21505 * [CPU] Switch CPUDefault pipeline to use IREE::CPU::LoweringConfigAttr. by @hanhanW in https://github.com/iree-org/iree/pull/21515 * [CPU] Switch CPUDoubleTilingExpert pipeline to use IREE::CPU::LoweringConfigAttr. by @hanhanW in https://github.com/iree-org/iree/pull/21354 * [Codegen] Add pattern to bubble bitcast past extract_slice by @qedawkins in https://github.com/iree-org/iree/pull/21518 * [CPU] Tile reduction dimensions for non-root reduction ops. by @hanhanW in https://github.com/iree-org/iree/pull/21500 * [HAL] Add hal.allocator.resolve_memory_properties by @ziereis in https://github.com/iree-org/iree/pull/21115 * Integrate LLVM at llvm/llvm-project@a28e7f1aad3e by @hanhanW in https://github.com/iree-org/iree/pull/21520 * [CPU] Convert accumulating GEMMs to GEMMs. by @hanhanW in https://github.com/iree-org/iree/pull/21473 * Migrate existing mi300 runners to new mi325 capacity. by @deedongala in https://github.com/iree-org/iree/pull/21523 * [Codegen][NFC] Switch to new LDBG macro. by @hanhanW in https://github.com/iree-org/iree/pull/21525 * [Dispatch Creation] Run multi-use fusion after forming dispatches by @IanWood1 in https://github.com/iree-org/iree/pull/21524 * Integrate stablehlo at openxla/stablehlo@69d6dae46e by @hanhanW in https://github.com/iree-org/iree/pull/21529 * [LinalgExt] Use IndexingMapOpInterface for attention by @IanWood1 in https://github.com/iree-org/iree/pull/21469 * [CPU][NFC] Switch existing tests to use IREE::CPU::LoweringConfig. by @hanhanW in https://github.com/iree-org/iree/pull/21516 * [docs] Add a configuration example for ROCm/HIP targets. by @hanhanW in https://github.com/iree-org/iree/pull/21535 * [Codegen][GPU] Tile fully dynamic root ops to the subgroup size by @krzysz00 in https://github.com/iree-org/iree/pull/21526 * [NFC] Add a dev flag to not do reduction vector distribution by @nirvedhmeshram in https://github.com/iree-org/iree/pull/21532 * [Dispatch] Fix return in multiuse fusion by @IanWood1 in https://github.com/iree-org/iree/pull/21536 * Integrate LLVM at llvm/llvm-project@8e9a0fc0f2e5 by @hanhanW in https://github.com/iree-org/iree/pull/21533 * [LLVMGPU][Codegen] Set FMF for arith.mulf + arith.addf -> math.fma by @efric in https://github.com/iree-org/iree/pull/21528 * [GPU] Add col_major optional attribution to VirtualMMAAttr by @bangtianliu in https://github.com/iree-org/iree/pull/21537 * Pattern to hoist pack unpack ops from scf.for op by @YashDeshpande25 in https://github.com/iree-org/iree/pull/21431 * [Codegen][Encoding] Fix generic op materialization with 0D tensors by @Max191 in https://github.com/iree-org/iree/pull/21545 * [CPU] Refresh CPU pipeline verification. by @hanhanW in https://github.com/iree-org/iree/pull/21541 * [CPU] Drop empty tile sizes from lowering config. by @hanhanW in https://github.com/iree-org/iree/pull/21542 * [DT] Perform vectorization if the value is defined by scf.for by @Abhishek-Varma in https://github.com/iree-org/iree/pull/21543 * iree/runtime: iree-cpuinfo: add SME/SVE feature checks for ARM64 macOS by @Manewing in https://github.com/iree-org/iree/pull/21427 * [iree][gpu] Add LLVM func attributes when setting lowering attention config and change default MNTile seed by @fabianmcg in https://github.com/iree-org/iree/pull/21547 * [Codegen] Fix dominance issues blocking consumer fusions by @Max191 in https://github.com/iree-org/iree/pull/21551 * Revert "[iree][gpu] Add LLVM func attributes when setting lowering attention config and change default MNTile seed" by @fabianmcg in https://github.com/iree-org/iree/pull/21561 * Fix parentheses warning in ireeGPUGetSingleSubgroupLayout by @jtuyls in https://github.com/iree-org/iree/pull/21558 * Integrate LLVM at llvm/llvm-project@1194353 by @jtuyls in https://github.com/iree-org/iree/pull/21559 * [CPU][NFCI] Drop the use of TilingConfig from pipeline. by @hanhanW in https://github.com/iree-org/iree/pull/21556 * [Codegen] Add padding for convolutions before IGEMM by @yzhang93 in https://github.com/iree-org/iree/pull/21470 * [LinalgExt] Implement unit dim folding pattern for map_scatter by @Max191 in https://github.com/iree-org/iree/pull/21563 * [GPU] Add vector distribution pattern for map_scatter by @Max191 in https://github.com/iree-org/iree/pull/21124 * [Codegen] Fix dynamic tensor ukernel descriptor lowering by @jtuyls in https://github.com/iree-org/iree/pull/21570 * [Codegen][e2e testing] Add regression tests of matvec with dynamic reduction by @newling in https://github.com/iree-org/iree/pull/21538 * [VMVX] Migrate VMVX backend to use IREE::CPU::LoweringConfigAttr. by @hanhanW in https://github.com/iree-org/iree/pull/21566 * [CPU][NFC] Migrate TilingConfig to interface methods in split reduction pass. by @hanhanW in https://github.com/iree-org/iree/pull/21564 * [Codegen] Introduce lowering config interface methods for vectorization. by @hanhanW in https://github.com/iree-org/iree/pull/21555 * [CPU][NFC] Migrate TilingConfig to interface methods in LLVMCPU2DScalableTo1DScalable pass. by @hanhanW in https://github.com/iree-org/iree/pull/21565 * [CPU] Drop TilingConfig from KernelDispatch.cpp by @hanhanW in https://github.com/iree-org/iree/pull/21567 * [CPU][NFC] Delete TilingConfig. by @hanhanW in https://github.com/iree-org/iree/pull/21568 * [CPU][DT] Implement data layout propagation for CPU dispatches. by @hanhanW in https://github.com/iree-org/iree/pull/21554 * [Codegen][IGEMM] Fix pre-padding for group convolutions by @yzhang93 in https://github.com/iree-org/iree/pull/21583 * [CPU][AArch64][Test] Add more tests for encoding materialisation by @banach-space in https://github.com/iree-org/iree/pull/21560 * [ROCM] Add ukernel descriptor PDL pattern infra by @jtuyls in https://github.com/iree-org/iree/pull/21572 * Integrate LLVM at llvm/llvm-project@215e6beae02334 by @hanhanW in https://github.com/iree-org/iree/pull/21576 * [DT] Add support for materializing func.func and func.return op. by @hanhanW in https://github.com/iree-org/iree/pull/21582 * [Codegen][GPU] Adding heuristic strategy to reduce tile size to fill workloads to all CUs by @jerryyin in https://github.com/iree-org/iree/pull/21546 * Register `VectorExt` Dialect in LLVMCPUTarget by @NoumanAmir657 in https://github.com/iree-org/iree/pull/21593 * [Codegen] Refactor CombineLayoutTransformation with scope options by @Max191 in https://github.com/iree-org/iree/pull/21577 * [Codegen] Cater to bitwidth of largest operand in reduction by @kuhar in https://github.com/iree-org/iree/pull/21438 * [DT] Fix a bug in encoding propagation when there are scalar inputs. by @hanhanW in https://github.com/iree-org/iree/pull/21596 * LDBG fixes for "[Codegen] Cater to bitwidth of largest operand in reduction" by @hanhanW in https://github.com/iree-org/iree/pull/21601 * [LLVMGPU] Support map_scatter in LLVMGPUVectorDistribute pipeline by @Max191 in https://github.com/iree-org/iree/pull/21595 * [Codegen] linalg.generic with dynamic reduction dim: use `LLVMGPUVectorDistribution`. by @newling in https://github.com/iree-org/iree/pull/21430 * [Dispatch Creation] Fuse pad with generic conv consumer by @IanWood1 in https://github.com/iree-org/iree/pull/21606 * Integrate llvm-project@cfd1ee781f by @krzysz00 in https://github.com/iree-org/iree/pull/21598 * [CODEGEN] Remove special case logic for poison padding by @newling in https://github.com/iree-org/iree/pull/21574 * [CODEGEN] Allow pack-unpack pairs to be hoisted through multiple forOps by @YashDeshpande25 in https://github.com/iree-org/iree/pull/21569 * [Codegen][VectorDistribute] Add pattern to distribute poison by @newling in https://github.com/iree-org/iree/pull/21573 * [Codegen] Refactor CombineLayoutTransformation to use patterns by @Max191 in https://github.com/iree-org/iree/pull/21592 * [ROCM] Add support for multiple-of/bounds PDL constraints by @jtuyls in https://github.com/iree-org/iree/pull/21578 * Integrate llvm-project@351b38f2 by @krzysz00 in https://github.com/iree-org/iree/pull/21609 * Update the logic for resolve `scf.forall` to account for maximum number of workgroups. by @MaheshRavishankar in https://github.com/iree-org/iree/pull/21584 * [CPU] Re-enable math tests for RISC-V targets. by @hanhanW in https://github.com/iree-org/iree/pull/21608 * [NFC] Trim compile flags from GPU sharktank tests. by @hanhanW in https://github.com/iree-org/iree/pull/21617 * [Codegen][ROCDL] Add test to ensure fp4 truncation is packed by @krzysz00 in https://github.com/iree-org/iree/pull/21553 * [Codegen] Add CPU e2e tests for fp4 conversions by @krzysz00 in https://github.com/iree-org/iree/pull/21445 * [DispatchCreation] Allow more encoding op fusions by @Max191 in https://github.com/iree-org/iree/pull/21612 * [Codegen] Fix return of non-owning reference in CombineLayoutTransformation by @Max191 in https://github.com/iree-org/iree/pull/21618 * [CPU] Use default flags + iree-opt-level in sharktank tests. by @hanhanW in https://github.com/iree-org/iree/pull/21607 * Cleanup the way config values are retrieved. by @MaheshRavishankar in https://github.com/iree-org/iree/pull/21610 * [GPU][Codegen] Distribute to single subgroup for large parallel dimension in reduction by @efric in https://github.com/iree-org/iree/pull/21499 * Revert "[Codegen] Fix dominance issues blocking consumer fusions (#21…551)" by @Max191 in https://github.com/iree-org/iree/pull/21632 * [ROCM] Add PDL pattern driver for embedding ukernels by @jtuyls in https://github.com/iree-org/iree/pull/21591 * [GPU] Add pass to tile convolution operations to matmul by @nirvedhmeshram in https://github.com/iree-org/iree/pull/21552 * [GPU][NFCI] Make dot/mma field optional and trim the IR. by @hanhanW in https://github.com/iree-org/iree/pull/21626 * Reapply "[Codegen] Fix dominance issues blocking consumer fusions (#21551)" by @Max191 in https://github.com/iree-org/iree/pull/21637 * [GPU][NFC] Deprecate iree-codegen-gpu-native-math-precision flag. by @hanhanW in https://github.com/iree-org/iree/pull/21636 * Bump llvm/torch-mlir@46925eb by @zjgarvey in https://github.com/iree-org/iree/pull/21628 * [GPU][DT] dce unused tensor.dim ops in SpecializeExports by @jtuyls in https://github.com/iree-org/iree/pull/21624 * Integrate llvnm-project@ff616a19 by @krzysz00 in https://github.com/iree-org/iree/pull/21614 * [docs] Add documentation for updating golden outputs by @efric in https://github.com/iree-org/iree/pull/21641 * Unifying dispatch/dispatch_indirect and adding extended configuration. by @benvanik in https://github.com/iree-org/iree/pull/21627 * Adding iree_hal_device_queue_dispatch. by @benvanik in https://github.com/iree-org/iree/pull/21630 * [LLVMCPU] Fix llvmcpu check before conversion for complex types by @castigli in https://github.com/iree-org/iree/pull/21644 * [Codegen] Update tests to be in correct state for strategy selection by @newling in https://github.com/iree-org/iree/pull/21647 * Toggle the default option to false for pre-padding convolution flag by @yzhang93 in https://github.com/iree-org/iree/pull/21579 * Update workgroup count op syntax by @rkayaith in https://github.com/iree-org/iree/pull/21656 * Integrate llvm/llvm-project@1ffc38ca4 by @kuhar in https://github.com/iree-org/iree/pull/21658 * [Codegen] Select for pad value just before yielding by @newling in https://github.com/iree-org/iree/pull/21581 * Free buffers synchronously if async caching is disabled. by @AWoloszyn in https://github.com/iree-org/iree/pull/21668 * [DT][NFCI] Switch SetEncoding pass to walk-based pass. by @hanhanW in https://github.com/iree-org/iree/pull/21662 * Bump the github-actions group with 3 updates by @dependabot[bot] in https://github.com/iree-org/iree/pull/21655 * Add support for ml_dtypes to python runtime bindings by @rsuderman in https://github.com/iree-org/iree/pull/21549 * Integrate llvm/llvm-project@8071d279 by @kuhar in https://github.com/iree-org/iree/pull/21669 * [LLVMCPU] Tracks the dimension mapping for multi lowering config by @Yu-Zhewen in https://github.com/iree-org/iree/pull/21649 * [CPU][NFC] Improve code quality and make few methods local. by @hanhanW in https://github.com/iree-org/iree/pull/21673 * Bump llvm/torch-mlir@155680c by @vivekkhandelwal1 in https://github.com/iree-org/iree/pull/21680 * [DT][CPU] Exclude pack ops with reshape producers from lowering config setting by @Yu-Zhewen in https://github.com/iree-org/iree/pull/21675 * [ROCM] Add ukernel descriptor lowering to pipeline by @jtuyls in https://github.com/iree-org/iree/pull/21634 * Fix workgroup_count_from_slice assembly format in test by @jtuyls in https://github.com/iree-org/iree/pull/21685 * [Codegen][GPU] Use arithmetic intensity to guide gemm size categorization - Step 1 by @jerryyin in https://github.com/iree-org/iree/pull/21638 * Integrate llvm/llvm-project@0ff92fe2f by @kuhar in https://github.com/iree-org/iree/pull/21689 * [DispatchCreation] Drop unit dims for flow.parameter.named by @Groverkss in https://github.com/iree-org/iree/pull/21687 * [DT] Set encodings if `iree.opt.data_tiling` unit attribute is attached. by @hanhanW in https://github.com/iree-org/iree/pull/21676 * Add `ChipDetails` definition for MI350X and MI355X target. by @amd-eochoalo in https://github.com/iree-org/iree/pull/21690 * Numerical tests: softmax with dynamic reduction size by @newling in https://github.com/iree-org/iree/pull/21594 * Integrate llvm/llvm-project@9a14b1d254a by @kuhar in https://github.com/iree-org/iree/pull/21702 * [Codegen] Skip scalar ops in large tensor tiling pass by @qedawkins in https://github.com/iree-org/iree/pull/21704 * Revert "[codegen][gpu] Add the `iree-rocdl-use-buffer-instructions` pass (#21335)" by @MaheshRavishankar in https://github.com/iree-org/iree/pull/21695 * [docs] Fix a typo and attach the pass link in tuning.md by @hanhanW in https://github.com/iree-org/iree/pull/21707 * [Codegen][LLVMGPU] Config tests for matmuls by @newling in https://github.com/iree-org/iree/pull/21697 * [Codegen] Use vector distribute for softmax with dynamic reduction size by @newling in https://github.com/iree-org/iree/pull/21650 * Work around gcc bug. NFC. by @kuhar in https://github.com/iree-org/iree/pull/21711 * Move windows builds to experimental to unblock release packages. by @MaheshRavishankar in https://github.com/iree-org/iree/pull/21712 * Fix misc coding issues. NFC. by @kuhar in https://github.com/iree-org/iree/pull/21713 * Exclude broken ninja version for Windows package builds. by @ScottTodd in https://github.com/iree-org/iree/pull/21717 * Disable failing ARM-SME tests. by @MaheshRavishankar in https://github.com/iree-org/iree/pull/21715 * Use range small vector constructors. NFC. by @kuhar in https://github.com/iree-org/iree/pull/21719 * Expose loops transforms through python api by @Hardcode84 in https://github.com/iree-org/iree/pull/21710 * [Integrate] Drop LLVM revert of "Remove matmul_transpose variants" by @hanhanW in https://github.com/iree-org/iree/pull/21344 * Drop needless template parameters from patterns. NFC. by @kuhar in https://github.com/iree-org/iree/pull/21721 * Revert "Move windows builds to experimental to unblock release packages." by @ScottTodd in https://github.com/iree-org/iree/pull/21723 * [DT] Drop the data-tiling hint after encodings are set. by @hanhanW in https://github.com/iree-org/iree/pull/21724 * [ROCM] Readd SpecializeExports pass by @qedawkins in https://github.com/iree-org/iree/pull/21727 * Fix SmallVector conversion error with gcc by @jtuyls in https://github.com/iree-org/iree/pull/21725 * Bump sarisia/actions-status-discord from 1.15.3 to 1.15.4 in the github-actions group by @dependabot[bot] in https://github.com/iree-org/iree/pull/21730 * Integrate llvm/llvm-project@6fc1deb8b749 by @hanhanW in https://github.com/iree-org/iree/pull/21732 * Remove myself from samples/ CODEOWNERS. by @ScottTodd in https://github.com/iree-org/iree/pull/21726 * [Codegen] Improve early bufferized padding codegen by @Max191 in https://github.com/iree-org/iree/pull/21694 * [CPU] Improve TileRootAndFuseProducerConsumer pass and deprecate TileAndFuse pass. by @hanhanW in https://github.com/iree-org/iree/pull/21674 * Apply UnsignedWhenEquivalent at the ModuleOp level. by @amd-eochoalo in https://github.com/iree-org/iree/pull/21743 * Integrate LLVM at llvm/llvm-project@c65c0e87fc73 by @hanhanW in https://github.com/iree-org/iree/pull/21744 * Integrate LLVM at [bfab80] by @Groverkss in https://github.com/iree-org/iree/pull/21747 * Adding semaphore creation and wait flags for controlling behavior. by @benvanik in https://github.com/iree-org/iree/pull/21619 * Adding iree_hal_device_queue_host_call and emulation. by @benvanik in https://github.com/iree-org/iree/pull/21653 * Fixing merge conflict from [#21619] + [#21653]. by @benvanik in https://github.com/iree-org/iree/pull/21751 * [ConstEval] Do not jit parameterized flow.tensor.constants by @Groverkss in https://github.com/iree-org/iree/pull/21748 * [Dispatch] CollapseDims for extract_slice and scf.forall by @IanWood1 in https://github.com/iree-org/iree/pull/21708 * [Codegen] Add matmul and batched matmul to list of ops to generalize by @newling in https://github.com/iree-org/iree/pull/21720 * [NFC] Moving iree_hal_amdgpu_bitmap to iree/base/internal/. by @benvanik in https://github.com/iree-org/iree/pull/21666 * Temporarily disable the circular buffer for parameter uploads. by @AWoloszyn in https://github.com/iree-org/iree/pull/21758 * [RISCV] Remove unused cmake variables. by @HanKuanChen in https://github.com/iree-org/iree/pull/21746 * Adding IREE_HAL_COMMAND_BUFFER_MODE_UNRETAINED flag. by @benvanik in https://github.com/iree-org/iree/pull/21755 * [DT] Graduate data-tiling fusion from experimental flag to binding option. by @hanhanW in https://github.com/iree-org/iree/pull/21745 * [ROCM] Port mlir ukernels to ukernel descriptor lowering flow by @jtuyls in https://github.com/iree-org/iree/pull/21683 * [Codegen] PV and QK matmul's must have same acc layout by @newling in https://github.com/iree-org/iree/pull/21729 * [DispatchCreation] Fix trailing unit dims case for collapse of expand folding by @dan-garvey in https://github.com/iree-org/iree/pull/21677 * [Codegen] Add corner case for SwapExtractWithCollapsePattern by @yzhang93 in https://github.com/iree-org/iree/pull/21773 * [ROCM] Fix redefinition of symbol error for including tensor ukernels by @jtuyls in https://github.com/iree-org/iree/pull/21780 * [Codegen][IGEMM] Fix and preserve padding dim order for convs by @yzhang93 in https://github.com/iree-org/iree/pull/21772 * [ROCM] Update Ukernel infra to allow ROCM-specific bitcode ukernel lowering by @Abhishek-Varma in https://github.com/iree-org/iree/pull/21681 * [Codegen] Add XOR-based Swizzle Attribute by @sebvince in https://github.com/iree-org/iree/pull/21562 * [GPU][DT] Fix matmul narrow dim selection by @Yu-Zhewen in https://github.com/iree-org/iree/pull/21764 * [NFC] Remove debug messages by @Muzammiluddin-Syed-ECE in https://github.com/iree-org/iree/pull/21768 * Integrate LLVM at llvm/llvm-project@4b84223aad4f by @IanWood1 in https://github.com/iree-org/iree/pull/21791 * [Codegen][Tuner] expose python binding to query target info by @bangtianliu in https://github.com/iree-org/iree/pull/21782 * [Codegen] Remove WarpReduction from ROCDL pipeline by @newling in https://github.com/iree-org/iree/pull/21795 * [Codegen][GPU] Use arithmetic intensity to guide gemm size categorization - step 2 by @jerryyin in https://github.com/iree-org/iree/pull/21691 * [Dispatch][GlobalOpt] Improve transpose fusion for conv by @IanWood1 in https://github.com/iree-org/iree/pull/21778 * [Codegen][LLVMGPU] Give ops same config irrespective of generalized/specialized by @newling in https://github.com/iree-org/iree/pull/21769 * Drop TensorCore/MMA pipelines. by @MaheshRavishankar in https://github.com/iree-org/iree/pull/21741 * Integrate LLVM at llvm/llvm-project@f2e6ca805dbb by @IanWood1 in https://github.com/iree-org/iree/pull/21805 * [Codegen][GPU] Adding new heuristics to take all dimensions into account when distributing tiles by @jerryyin in https://github.com/iree-org/iree/pull/21803 * [GPU] Add pattern to sink extract_slice through generic ops by @nirvedhmeshram in https://github.com/iree-org/iree/pull/21796 * [ROCM] Add zero fill check to ukernel patterns by @jtuyls in https://github.com/iree-org/iree/pull/21793 * [GPU][DT] Fix LHS operand offset calculation for DataTiledMMAAttr by @Yu-Zhewen in https://github.com/iree-org/iree/pull/21808 * [VectorDistribute] Correctly find new dimensions during reduction config by @Groverkss in https://github.com/iree-org/iree/pull/21797 * [VectorDistribute] Do not handle bit extend during matmul configuration by @Groverkss in https://github.com/iree-org/iree/pull/21798 * [codegen] more consumer fusion by @ftynse in https://github.com/iree-org/iree/pull/21521 * Move ROCM tests to fix dialect not registered error by @jtuyls in https://github.com/iree-org/iree/pull/21811 * Migrate ROCM ukernels from tuning spec to ukernel descriptor lowering by @jtuyls in https://github.com/iree-org/iree/pull/21794 * [Codegen] Rewrite test so LLVMGPUWarpReduction is not used by @newling in https://github.com/iree-org/iree/pull/21770 * [LinalgExt][NFC] Delete duplicated SingleBlockImplicitTerminator trait. by @hanhanW in https://github.com/iree-org/iree/pull/21818 * Revert "[codegen] more consumer fusion (#21521)" by @pravg-amd in https://github.com/iree-org/iree/pull/21819 * [Codegen][LLVMGPU] Remove LLVMGPUWarpReduction pipeline by @newling in https://github.com/iree-org/iree/pull/21821 * [codegen][rocdl] Remove ROCDLKernelConfig and ROCDLSelectLoweringStrategy by @fabianmcg in https://github.com/iree-org/iree/pull/21820 * Revert "[VectorDistribute] Correctly find new dimensions during reduction config" by @Groverkss in https://github.com/iree-org/iree/pull/21810 * Integrate LLVM at llvm/llvm-project@74275a11038c by @Muzammiluddin-Syed-ECE in https://github.com/iree-org/iree/pull/21831 * [Codegen][GPU] Use arithmetic intensity to guide gemm size categorization - step 3 by @jerryyin in https://github.com/iree-org/iree/pull/21826 * [Hoisting] Fix the double-free issue in `HoistIntoGlobalsPass::cleanupDeadOp`. by @JerryShih in https://github.com/iree-org/iree/pull/21699 * [iree-test-suites] Add data tiling tests for LLAMA 8B by @Abhishek-Varma in https://github.com/iree-org/iree/pull/21832 * Integrate LLVM at llvm/llvm-project@9c7727c62af0 by @fabianmcg in https://github.com/iree-org/iree/pull/21835
New Contributors
- @atgutier made their first contribution in https://github.com/iree-org/iree/pull/21479
- @deedongala made their first contribution in https://github.com/iree-org/iree/pull/21523
- @Manewing made their first contribution in https://github.com/iree-org/iree/pull/21427
- @castigli made their first contribution in https://github.com/iree-org/iree/pull/21644
- @Yu-Zhewen made their first contribution in https://github.com/iree-org/iree/pull/21649
- @amd-eochoalo made their first contribution in https://github.com/iree-org/iree/pull/21690
Full Changelog: https://github.com/iree-org/iree/compare/v3.6.0...v3.7.0