The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
README.md	2025-08-16	2.3 kB	0
v1.9.0 source code.tar.gz	2025-08-16	33.2 MB	0
v1.9.0 source code.zip	2025-08-16	34.1 MB	1
Totals: 3 Items		67.3 MB	1

What's new in 1.9.0 (2025-08-16)

These are the changes in inference v1.9.0.

New features

FEAT: [UI] running models data display replica. by @yiboyasss in https://github.com/xorbitsai/inference/pull/3897
FEAT: [model] Qwen-Image by @qinxuye in https://github.com/xorbitsai/inference/pull/3916
FEAT: [model] gpt-oss by @qinxuye in https://github.com/xorbitsai/inference/pull/3924
FEAT: function calling support for deepseek-r1-0528 by @qinxuye in https://github.com/xorbitsai/inference/pull/3931
FEAT: Support for GLM 4.5 quantized models by @Jun-Howie in https://github.com/xorbitsai/inference/pull/3945
FEAT: sglang support streaming function call by @aniya105 in https://github.com/xorbitsai/inference/pull/3939
FEAT: parsing harmony format for gpt-oss by @qinxuye in https://github.com/xorbitsai/inference/pull/3948
FEAT: Add support for switching rerank model engines and support for rerank of vllm engine by @zhcn000000 in https://github.com/xorbitsai/inference/pull/3881
FEAT: Support GLM-4.5v by @Jun-Howie in https://github.com/xorbitsai/inference/pull/3957

ENH: Add qwen3 new model to tool call list by @zhcn000000 in https://github.com/xorbitsai/inference/pull/3900
ENH: Update chat_template for Qwen3-Coder by @Jun-Howie in https://github.com/xorbitsai/inference/pull/3944
ENH: add flash_attention control params attn_implementation by @amumu96 in https://github.com/xorbitsai/inference/pull/3951
ENH: support qwen-image gguf by @qinxuye in https://github.com/xorbitsai/inference/pull/3954
ENH: clean embedding model cache when using vllm engine by @amumu96 in https://github.com/xorbitsai/inference/pull/3956
BLD: Downgrade flash-attn to version 2.7.4 by @zwt-1234 in https://github.com/xorbitsai/inference/pull/3953

Replace @torch.no_grad() with @torch.inference_mode() in Qwen3-Reranker by @yasu-oh in https://github.com/xorbitsai/inference/pull/3911

Full Changelog: https://github.com/xorbitsai/inference/compare/v1.8.1...v1.9.0

Source: README.md, updated 2025-08-16