Download Latest Version CUTLASS 4.1.0 source code.tar.gz (33.1 MB)
Email in envelope

Get an email when there's a new version of CUTLASS

Home / v4.1.0
Name Modified Size InfoDownloads / Week
Parent folder
CUTLASS 4.1.0 source code.tar.gz 2025-07-22 33.1 MB
CUTLASS 4.1.0 source code.zip 2025-07-22 41.7 MB
README.md 2025-07-22 2.3 kB
Totals: 3 Items   74.8 MB 7

CuTe DSL * Add aarch64 support, you can now pip install nvidia-cutlass-dsl on GB200 systems! * More examples demonstrating how to use CuTe DSL to write peak-performance kernels - Blackwell Mamba2 SSD - Blackwell SM100 persistent dense blockscaled GEMM with static scheduling * API updates - Please refer to FUNCTIONALITY.md for details

CUTLASS C++ * Further enhance Blackwell SM100 Attention kernels in example 77. - Add variable sequence length support for FMHA Backward kernel. - Add varlen test support to Backward runner. - Codes support empty batch sequences. * Replace subbyte_iterator with cute::recast_ptr when constructing logical iterators/arrays. * CuTe changes: - Rewrite ArithTuple and ScaledBasis for robustness and clarity. - Remove buggy and kludgy get_layoutA|B|C_MN and friends from Atoms/TiledX. - Factor out print_latex and friends and rewrite. - Factor out print_svg and friends and rewrite. * Support Blackwell SM100 SIMT packed fp32x2 kernels. * Support residual add for implicit gemm kernels. * Various fixes for CUTLASS C++ Python interface's EVT tracer: - Add verifier for sm90 to report the invalid input. - When adding an edge to the graph, if the edge already exists, add an identity compute node to avoid having multiple parallel edges. - Register operations of tanh, sigmoid, exp, gelu to the python ast frontend. - Replace the NotImplemented Error by packing all nodes into a single topological visitor node as a fallback. * Fix profiler bugs in exhaustive perf search. - Fix incorrect cluster shape output issue when doing exhaustive search. - Fix a bug in profiler grouped GEMM for setting tile scheduler swizzles, cluster shapes, and raster orders. * Fix some profiler issues. - Complete the reference for Blackwell blockwise gemm kernels. - Fix incorrect regex logic for L1 test.

Source: README.md, updated 2025-07-22