Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
README.md | 2025-08-16 | 2.3 kB | |
v1.9.0 source code.tar.gz | 2025-08-16 | 33.2 MB | |
v1.9.0 source code.zip | 2025-08-16 | 34.1 MB | |
Totals: 3 Items | 67.3 MB | 1 |
What's new in 1.9.0 (2025-08-16)
These are the changes in inference v1.9.0.
New features
- FEAT: [UI] running models data display replica. by @yiboyasss in https://github.com/xorbitsai/inference/pull/3897
- FEAT: [model] Qwen-Image by @qinxuye in https://github.com/xorbitsai/inference/pull/3916
- FEAT: [model] gpt-oss by @qinxuye in https://github.com/xorbitsai/inference/pull/3924
- FEAT: function calling support for deepseek-r1-0528 by @qinxuye in https://github.com/xorbitsai/inference/pull/3931
- FEAT: Support for GLM 4.5 quantized models by @Jun-Howie in https://github.com/xorbitsai/inference/pull/3945
- FEAT: sglang support streaming function call by @aniya105 in https://github.com/xorbitsai/inference/pull/3939
- FEAT: parsing harmony format for gpt-oss by @qinxuye in https://github.com/xorbitsai/inference/pull/3948
- FEAT: Add support for switching rerank model engines and support for rerank of vllm engine by @zhcn000000 in https://github.com/xorbitsai/inference/pull/3881
- FEAT: Support GLM-4.5v by @Jun-Howie in https://github.com/xorbitsai/inference/pull/3957
Enhancements
- ENH: Add qwen3 new model to tool call list by @zhcn000000 in https://github.com/xorbitsai/inference/pull/3900
- ENH: Update chat_template for Qwen3-Coder by @Jun-Howie in https://github.com/xorbitsai/inference/pull/3944
- ENH: add flash_attention control params attn_implementation by @amumu96 in https://github.com/xorbitsai/inference/pull/3951
- ENH: support qwen-image gguf by @qinxuye in https://github.com/xorbitsai/inference/pull/3954
- ENH: clean embedding model cache when using vllm engine by @amumu96 in https://github.com/xorbitsai/inference/pull/3956
- BLD: Downgrade flash-attn to version 2.7.4 by @zwt-1234 in https://github.com/xorbitsai/inference/pull/3953
Bug fixes
- BUG: limit datasets version by @qinxuye in https://github.com/xorbitsai/inference/pull/3943
Documentation
- DOC: add doc about cu128 docker by @qinxuye in https://github.com/xorbitsai/inference/pull/3899
- DOC: Update xllamacpp doc by @codingl2k1 in https://github.com/xorbitsai/inference/pull/3862
Others
- Replace @torch.no_grad() with @torch.inference_mode() in Qwen3-Reranker by @yasu-oh in https://github.com/xorbitsai/inference/pull/3911
Full Changelog: https://github.com/xorbitsai/inference/compare/v1.8.1...v1.9.0