Download Latest Version v0.9.4_ Goodbye 2025 source code.zip (5.4 MB)
Email in envelope

Get an email when there's a new version of LLaMA-Factory

Home / v0.9.2
Name Modified Size InfoDownloads / Week
Parent folder
llamafactory-0.9.2.tar.gz 2025-03-11 235.1 kB
llamafactory-0.9.2-py3-none-any.whl 2025-03-11 279.8 kB
README.md 2025-03-11 7.9 kB
v0.9.2_ MiniCPM-o, SwanLab, APOLLO source code.tar.gz 2025-03-11 9.7 MB
v0.9.2_ MiniCPM-o, SwanLab, APOLLO source code.zip 2025-03-11 9.8 MB
Totals: 5 Items   20.0 MB 0

This is the last version before LLaMA-Factory v1.0.0. We are working hard to improve the efficiency and availability.

We will attend the vLLM Beijing Meetup on Mar 16th! See you in Beijing πŸ‘‹

New features

  • πŸ”₯ APOLLO optimizer by @zhuhanqing in [#6617]
  • πŸ”₯ SwanLab experiment tracker by @Zeyi-Lin in [#6401]
  • πŸ”₯ Ray Trainer by @erictang000 in [#6542]
  • Batch inference with vLLM TP by @JieShenAI in [#6190]
  • QLoRA on Ascend NPU by @codemayq in [#6601]
  • Yarn and Llama3 rope scaling by @hiyouga in [#6693]
  • Support uv run by @erictang000 in [#6907]
  • Ollama modelfile auto-generation by @codemayq in [#4686]
  • Mistral tool prompt by @AlongWY in [#5473]
  • Llama3 and Qwen2 tool prompt by @hiyouga in [#6367] and [#6369]

New models

  • Base models
  • GPT2 (0.1B/0.4B/0.8B/1.5B) πŸ“„
  • Granite 3.0-3.1 (1B/2B/3B/8B) πŸ“„
  • PaliGemma2 (3B/10B/28B) πŸ“„πŸ–ΌοΈ
  • Moonlight (16B) πŸ“„
  • DeepSeek V2-V2.5 Base (236B) πŸ“„
  • DeepSeek V3 Base (671B) πŸ“„
  • Instruct/Chat models
  • Granite 3.0-3.1 (1B/2B/3B/8B) by @Tuyohai in [#5922] πŸ“„πŸ€–
  • DeepSeek R1 (1.5B/7B/8B/14B/32B/70B/671B) by @Qwtdgh in [#6767] πŸ“„πŸ€–
  • TeleChat2 (3B/7B/12B/35B/115B) @ge-xing in [#6313] πŸ“„πŸ€–
  • Qwen2.5-VL (3B/7B/72B) by @hiyouga in [#6779] πŸ“„πŸ€–πŸ–ΌοΈ
  • PaliGemma2-mix (3B/10B/28B) by @Kuangdd01 in [#7060] πŸ“„πŸ€–πŸ–ΌοΈ
  • Qwen2 Audio (7B) by @BUAADreamer in [#6701] πŸ“„πŸ€–πŸ”ˆ
  • MiniCPM-V/MiniCPM-o (8B) by @BUAADreamer in [#6598] and [#6631] πŸ“„πŸ€–πŸ–ΌοΈπŸ”ˆ
  • InternLM3-Instruct (8B) by @hhaAndroid in [#6640] πŸ“„πŸ€–
  • Marco-o1 (8B) πŸ“„πŸ€–
  • Skywork-o1 (8B) πŸ“„πŸ€–
  • Phi-4 (14B) πŸ“„πŸ€–
  • Moonlight Instruct (16B) πŸ“„
  • Mistral Small (24B) πŸ“„πŸ€–
  • QwQ (32B) πŸ“„πŸ€–
  • Llama-3.3-Instruct (70B) πŸ“„πŸ€–
  • QvQ (72B) πŸ“„πŸ€–πŸ–ΌοΈ
  • DeepSeek V2-V2.5 (236B) πŸ“„πŸ€–
  • DeepSeek V3 (671B) πŸ“„πŸ€–

New datasets

  • Supervised fine-tuning datasets
  • OpenO1 (en) πŸ“„
  • Open Thoughts (en) πŸ“„
  • Open-R1-Math (en) πŸ“„
  • Chinese-DeepSeek-R1-Distill (zh) πŸ“„

Changes

  • Refactor VLMs register by @hiyouga in [#6600]
  • Refactor mm plugin by @hiyouga in [#6895]
  • Refactor template by @hiyouga in [#6896]
  • Refactor data pipeline by @hiyouga in [#6901]
  • Update vlm arguments by @hiyouga in [#6976]
  • We have cleaned large files in git history using BFG Repo-Cleaner, find the backup repo here

Bug fix

  • Add trust_remote_code option by @yafshar in [#5819]
  • Fix mllama config by @hiyouga in [#6137] and [#6140]
  • Fix mllama pad by @hiyouga in [#6151] and [#6874]
  • Pin tokenizers version by @hiyouga in [#6157]
  • Fix tokenized data loading by @village-way in [#6160]
  • Show hostname in webui by @hykilpikonna in [#6170]
  • Fix VLMs zero3 training by @hiyouga in [#6233]
  • Add skip_special_tokens by @hiyouga in [#6363]
  • Support non-reenterent-gc by @hiyouga in [#6364]
  • Add disable_shuffling option by @hiyouga in [#6388]
  • Fix gen kwargs by @hiyouga in [#6395]
  • Enable module run by @youkaichao in [#6457]
  • Fix eval loss value by @hiyouga in [#6465]
  • Fix paligemma inference by @hiyouga in [#6483]
  • Add deepseek v3 template by @piamo in [#5507]
  • Add http proxy argument in dockerfile by @shibingli in [#6462]
  • Fix trainer generate by @hiyouga in [#6512]
  • Fix pixtral DPO training by @hiyouga in [#6547]
  • Fix ray args by @stephen-nju in [#6564]
  • Fix minicpm template by @BUAADreamer in [#6620]
  • Fix stop tokens for visual detection by @hiyouga in [#6624]
  • Pin vllm version by @hiyouga in [#6629]
  • Fix mllama any image by @hiyouga in [#6637] and [#7053]
  • Fix tokenizer max length by @xiaosu-zhu in [#6632]
  • Fix webui locale by @steveepreston in [#6653]
  • Fix MiniCPM-o DPO training by @BUAADreamer in [#6657]
  • Fix Qwen2 MoE training by @hiyouga in [#6684]
  • Upgrade to gradio 5 by @hiyouga in [#6688]
  • Support Japanese local file by @engchina in [#6698]
  • Fix DPO loss by @yinpu in [#6722]
  • Webui thinking mode by @hiyouga in [#6778]
  • Upgrade to transformers 4.48 by @hiyouga in [#6628]
  • Fix ci by @hiyouga in [#6787]
  • Fix instructions about installing fa2 on win platform in readme by @neavo in [#6788]
  • Fix minicpmv plugin by @BUAADreamer in [#6801], [#6890], [#6946] and [#6998]
  • Fix qwen2 tool prompt by @yueqis in [#6796]
  • Fix llama pro by @hiyouga in [#6814]
  • Allow thought in function call by @yueqis in [#6797]
  • Add ALLOW_EXTRA_ARGS by @hiyouga in [#6831]
  • Fix Qwen2vl plugin by @hiyouga in [#6855]
  • Upgrade vllm to 0.7.2 by @hiyouga in [#6857]
  • Fix unit test for tool using by @hiyouga in [#6865]
  • Skip broken data in sharegpt converter by @JJJYmmm in [#6879]
  • Fix qwen2.5 plugin for video by @JJJYmmm in [#6868]
  • Parsing chat template from tokenizer by @hiyouga in [#6905] (experimental)
  • Fix mllama KTO training by @marko1616 in [#6904]
  • Fix grad checkpointing by @hiyouga in [#6916] and [#6931]
  • Fix ollama template by @hiyouga in [#6902]
  • Fix ray example by @erictang000 in [#6906]
  • Improve error handling for media by @noahc1510 in [#6128]
  • Support split on each dataset by @SrWYG in [#5522]
  • Fix gen kwargs in training by @aliencaocao in [#5451]
  • Liger kernel for qwen2.5vl by @hiyouga in [#6930]
  • Fix lora target modules by @hiyouga in [#6944]
  • Add ray_storage_path by @erictang000 in [#6920]
  • Fix trainer.predict by @hiyouga in [#6972]
  • Add min resolution control by @hiyouga in [#6975]
  • Upgrade transformers to 4.49 by @hiyouga in [#6982]
  • Add seed in vllm batch predict by @JieShenAI in [#7058]
  • Fix pyproject.toml by @hiyouga in [#7067]
  • Upgrade CANN images by @leo-pony in [#7061]
  • Display swanlab link by @Zeyi-Lin in [#7089]
  • Fix hf engine by @hiyouga in [#7120]
  • Add bailing chat template by @oldstree in [#7117]
  • Use bicubic resampler instead of nearest by @hiyouga in [#7143]
  • Fix Qwen2Audio plugin by @lsrami in [#7166]
  • Destroy process group by @hiyouga in [#7174]
  • Fix swanlab callback by @Zeyi-Lin in [#7176]
  • Fix paligemma plugin by @hiyouga in [#7181]
  • Escape html tag in webui by @hiyouga in [#7190]
  • Upgrade vllm to 0.7.3 by @hiyouga in [#7183] and [#7193]
  • Fix parser by @hiyouga in [#7204]
  • Fix function formatter by @zhangch-ss in [#7201]
  • Fix deepspeed config by @hiyouga in [#7205]
  • Fix dataloader by @hiyouga in [#7207]
  • Fix export tokenizer by @hiyouga in [#7230]
  • Update arguments by @hiyouga in [#7231]
  • Add swanlab_logdir by @Zeyi-Lin in [#7219]
  • Fix vllm batch prediction by @hiyouga in [#7235]
  • Avoid exit after saving tokenized data by @hiyouga in [#7244]
  • Support commit in env by @hiyouga in [#7247]
  • Release v0.9.2 by @hiyouga in [#7242]
  • Fix [#1204] [#3306] [#3462] [#5121] [#5270] [#5404] [#5444] [#5472] [#5518] [#5616] [#5712] [#5714] [#5756] [#5944] [#5986] [#6020] [#6056] [#6092] [#6136] [#6139] [#6149] [#6165] [#6213] [#6287] [#6320] [#6345] [#6345] [#6346] [#6348] [#6358] [#6362] [#6391] [#6415] [#6439] [#6448] [#6452] [#6482] [#6499] [#6543] [#6546] [#6551] [#6552] [#6610] [#6612] [#6636] [#6639] [#6662] [#6669] [#6738] [#6772] [#6776] [#6780] [#6782] [#6793] [#6806] [#6812] [#6819] [#6826] [#6833] [#6839] [#6850] [#6854] [#6860] [#6878] [#6885] [#6889] [#6937] [#6948] [#6952] [#6960] [#6966] [#6973] [#6981] [#7036] [#7064] [#7072] [#7116] [#7125] [#7130] [#7171] [#7173] [#7180] [#7182] [#7184] [#7192] [#7198] [#7213] [#7234] [#7243]

Full Changelog: https://github.com/hiyouga/LLaMA-Factory/compare/v0.9.1...v0.9.2

Source: README.md, updated 2025-03-11