FlashInfer - Browse /v0.2.9 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
README.md	2025-08-05	11.8 kB	0
v0.2.9 source code.tar.gz	2025-08-05	1.4 MB	0
v0.2.9 source code.zip	2025-08-05	1.9 MB	1
Totals: 3 Items		3.3 MB	1

What's Changed

Reduce the JIT compilation time of gen_gemm_sm100_module by @jinyangyuan-nvidia in https://github.com/flashinfer-ai/flashinfer/pull/1251
fix: correctly pass k_scale and v_scale to run() in forward_return_lse (#1023) by @vlev02 in https://github.com/flashinfer-ai/flashinfer/pull/1254
Made AR output optional + esthetic changes by @nvmbreughe in https://github.com/flashinfer-ai/flashinfer/pull/1265
init add gemm fp8 using cudnn backend by @ttyio in https://github.com/flashinfer-ai/flashinfer/pull/1264
Feature/sm100 low latency nvfp4 kernels by @azhurkevich in https://github.com/flashinfer-ai/flashinfer/pull/1214
CI: install nvidia-nvshmem-cu12 by @EmilienM in https://github.com/flashinfer-ai/flashinfer/pull/1262
feat: enable trtllm-gen mla MTP by @yyihuang in https://github.com/flashinfer-ai/flashinfer/pull/1258
Add trtllm-gen attention mha kernel with FP8 Q/K/V and FP8 output by @weireweire in https://github.com/flashinfer-ai/flashinfer/pull/1242
add trtllm-gen context attention by @IwakuraRein in https://github.com/flashinfer-ai/flashinfer/pull/1239
feat: add masked deepgemm support and benchmarking by @cyx-6 in https://github.com/flashinfer-ai/flashinfer/pull/1266
Add missing import in comm/init,py by @joker-eph in https://github.com/flashinfer-ai/flashinfer/pull/1275
hotfix: fix deepgemm artifactory hash by @cyx-6 in https://github.com/flashinfer-ai/flashinfer/pull/1278
Unify groupwise fp8 GEMM test by @cyx-6 in https://github.com/flashinfer-ai/flashinfer/pull/1281
fix: update trtllm-gen fmha benchmark by @yyihuang in https://github.com/flashinfer-ai/flashinfer/pull/1280
fix multiCtasKvScratchPtr misalignment issue (new one) by @nvpohanh in https://github.com/flashinfer-ai/flashinfer/pull/1286
Fix install folder regression, and JIT-vs-AOT differences by @directhex in https://github.com/flashinfer-ai/flashinfer/pull/1279
Add shuffle matrix flag by @aleozlx in https://github.com/flashinfer-ai/flashinfer/pull/1272
Convert scale_factor from scalar to Tensor in trt_allreduce_fusion by @ilmarkov in https://github.com/flashinfer-ai/flashinfer/pull/1284
patch error handling by @aleozlx in https://github.com/flashinfer-ai/flashinfer/pull/1293
Bug fix: guard fp8 e8m0 and e2m1 compile by @Edenzzzz in https://github.com/flashinfer-ai/flashinfer/pull/1287
refactor: Improved metainfo for trtllm-gen fmha by @cyx-6 in https://github.com/flashinfer-ai/flashinfer/pull/1292
add mm_fp4 use cudnn backend by @ttyio in https://github.com/flashinfer-ai/flashinfer/pull/1288
fix: minor errors in cubin loader by @yyihuang in https://github.com/flashinfer-ai/flashinfer/pull/1295
perfix: use lightweight API to query device property by @azhurkevich in https://github.com/flashinfer-ai/flashinfer/pull/1298
refactor: refactor trtllm-gen attention kernel integration code by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/1289
Remove FAST_BUILD FLAG for MOE by @wenscarl in https://github.com/flashinfer-ai/flashinfer/pull/1291
bugfix: ensure graph is captured and executed on the same stream to avoid rep… by @elfiegg in https://github.com/flashinfer-ai/flashinfer/pull/1303
minor: some fix and cleanup for trtllm-gen mha by @yyihuang in https://github.com/flashinfer-ai/flashinfer/pull/1302
[Feature] SM level profiler by @Edenzzzz in https://github.com/flashinfer-ai/flashinfer/pull/1305
Heuristics + testing unification + CUDA Graphs by @azhurkevich in https://github.com/flashinfer-ai/flashinfer/pull/1306
Update cutlass fp4 moe kernels by @wenscarl in https://github.com/flashinfer-ai/flashinfer/pull/1294
Fix the bug of the kernel-selection heuristic in trtllm-gen by @PerkzZheng in https://github.com/flashinfer-ai/flashinfer/pull/1307
test qkvo quantization not equal to 1. by @weireweire in https://github.com/flashinfer-ai/flashinfer/pull/1314
[fix] fix integer overflow in FA2 customized_mask & add buffer overflow warning. by @happierpig in https://github.com/flashinfer-ai/flashinfer/pull/1290
Addition of flashinfer_benchmark.py for benchmarking routines by @bkryu in https://github.com/flashinfer-ai/flashinfer/pull/1323
minor: update devcontainer by @yyihuang in https://github.com/flashinfer-ai/flashinfer/pull/1329
Fix redundant argument in TrtllmGenDecodeModule by @IwakuraRein in https://github.com/flashinfer-ai/flashinfer/pull/1326
Optimizations for TRTLLM MNNVL Allreduce by @timlee0212 in https://github.com/flashinfer-ai/flashinfer/pull/1321
add torch float4_e2m1fn_x2 check for cudnn fp4 backend by @ttyio in https://github.com/flashinfer-ai/flashinfer/pull/1333
only add cudnn dependency for x86 platform by @ttyio in https://github.com/flashinfer-ai/flashinfer/pull/1332
Make Fp8 MoE routing_bias optional by @aleozlx in https://github.com/flashinfer-ai/flashinfer/pull/1319
feat: Add weight layout option for trtllm-gen fused moe by @aleozlx in https://github.com/flashinfer-ai/flashinfer/pull/1297
[Fix] remove torch 2.8 requirement for FP4 GEMM by @elfiegg in https://github.com/flashinfer-ai/flashinfer/pull/1334
Bug fix: fix duplicate launch in POD by @Edenzzzz in https://github.com/flashinfer-ai/flashinfer/pull/1267
Add blockwise-scaled FP8 GEMM via TRTLLM-Gen. by @sergachev in https://github.com/flashinfer-ai/flashinfer/pull/1320
feat: support output nvfp4 in trtllm-gen function call. by @weireweire in https://github.com/flashinfer-ai/flashinfer/pull/1318
Fix bench deepgemm setting by @cyx-6 in https://github.com/flashinfer-ai/flashinfer/pull/1344
fix: fix trtllm-gen mla error on new interface by @yyihuang in https://github.com/flashinfer-ai/flashinfer/pull/1348
[Bugfix] Change max_size for LRU by @elfiegg in https://github.com/flashinfer-ai/flashinfer/pull/1349
Support loading autotuned results from json for cutlass fp4 moe backends by @kaixih in https://github.com/flashinfer-ai/flashinfer/pull/1310
Refactor scripts in benchmarks to use flasinfer.testing.bench_gpu_time by @bkryu in https://github.com/flashinfer-ai/flashinfer/pull/1337
bugfix: Change default index in routingTopKExperts by @amirkl94 in https://github.com/flashinfer-ai/flashinfer/pull/1347
Support passing kv_data_type to MultiLevelCascadeAttentionWrapper.plan() by @sarckk in https://github.com/flashinfer-ai/flashinfer/pull/1350
Add trtllm-gen prefill test. Fix related wrapper issue. by @weireweire in https://github.com/flashinfer-ai/flashinfer/pull/1346
feat: Support logits_soft_cap for Persistent attn; fix kv split limit by @Edenzzzz in https://github.com/flashinfer-ai/flashinfer/pull/1324
chore: remove cpp benchmarks, tests, cmake path, as they are deprecated by @hypdeb in https://github.com/flashinfer-ai/flashinfer/pull/1345
minor: add trtllm_gen_mla benchmark by @yyihuang in https://github.com/flashinfer-ai/flashinfer/pull/1316
cleanup: retire aot-build-utils by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/1354
minor: more informative error message for buffer overflow by @Edenzzzz in https://github.com/flashinfer-ai/flashinfer/pull/1357
gen_trtllm_comm_module: fix device capability detection by @dtrifiro in https://github.com/flashinfer-ai/flashinfer/pull/1356
Refactor Fused Moe Module by @wenscarl in https://github.com/flashinfer-ai/flashinfer/pull/1309
Add native cudnn_decode for improved cudnn decode performance by @Anerudhan in https://github.com/flashinfer-ai/flashinfer/pull/1283
Update CI docker container to use latest cudnn by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/1362
feature: add fp4 mm using trtllm backend by @ttyio in https://github.com/flashinfer-ai/flashinfer/pull/1355
support trtllm-gen prefill fp4 output by @weireweire in https://github.com/flashinfer-ai/flashinfer/pull/1360
Allow cudnn prefill kernels to be called natively by @Anerudhan in https://github.com/flashinfer-ai/flashinfer/pull/1317
bugfix: fix ci for aot-compile by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/1364
feat: auto deduce use_oneshot from token_num in all-reduce by @yyihuang in https://github.com/flashinfer-ai/flashinfer/pull/1365
add cutlass backend for mm_fp4 by @ttyio in https://github.com/flashinfer-ai/flashinfer/pull/1296
Support scale factor start index for fp4 mha prefill/decode by @weireweire in https://github.com/flashinfer-ai/flashinfer/pull/1363
test: add cuda graph to comm test by @yyihuang in https://github.com/flashinfer-ai/flashinfer/pull/1366
ci: add requests to ci docker container by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/1370
Artifact downloading and single sourced artifact path by @cyx-6 in https://github.com/flashinfer-ai/flashinfer/pull/1369
[fix] remove (view) transpose to keep consistent with majorness MN requirement. by @elfiegg in https://github.com/flashinfer-ai/flashinfer/pull/1358
hotfix: update mxfp4 groupwise-scaled gemm unittests by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/1359
bugfix: fixed cutlass fused moe usage of FP4QuantizationSFLayout::SWIZZLED by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/1371
ci: add blackwell unittest scripts by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/1372
Update documentation index by @cyx-6 in https://github.com/flashinfer-ai/flashinfer/pull/1374
bugfix: do cudnn related error check only when cudnn backend is enabled. by @ttyio in https://github.com/flashinfer-ai/flashinfer/pull/1377
bugfix: Add guard for fp4/fp8 related include headers by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/1376
refactor: download trtllm gemm metadata from server by @ttyio in https://github.com/flashinfer-ai/flashinfer/pull/1378
Fix sphinx error by @cyx-6 in https://github.com/flashinfer-ai/flashinfer/pull/1380
release: bump version to v0.2.9 by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/1381

New Contributors

@vlev02 made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/1254
@ttyio made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/1264
@azhurkevich made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/1214
@weireweire made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/1242
@IwakuraRein made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/1239
@nvpohanh made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/1286
@directhex made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/1279
@ilmarkov made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/1284
@elfiegg made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/1303
@PerkzZheng made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/1307
@bkryu made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/1323
@timlee0212 made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/1321
@sergachev made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/1320
@amirkl94 made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/1347
@sarckk made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/1350
@hypdeb made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/1345
@dtrifiro made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/1356

Full Changelog: https://github.com/flashinfer-ai/flashinfer/compare/v0.2.8...v0.2.9

Source: README.md, updated 2025-08-05

FlashInfer Files

FlashInfer: Kernel Library for LLM Serving

What's Changed

New Contributors

FlashInfer Files

FlashInfer: Kernel Library for LLM Serving

Get an email when there's a new version of FlashInfer

What's Changed

New Contributors