Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
README.md | 2025-07-09 | 10.9 kB | |
v0.11.0 source code.tar.gz | 2025-07-09 | 2.5 MB | |
v0.11.0 source code.zip | 2025-07-09 | 3.0 MB | |
Totals: 3 Items | 5.5 MB | 0 |
🚨 Breaking Changes
Upstream Patches for CCE, Phi3, Phi4
Our Cut-Cross-Entropy (CCE) patches have been moved to a dedicated upstream fork. This improves maintainability, enables the community to easily contribute new patches, and re-use across projects. This update includes:
- Updates to supportÂ
transformers>=4.52.4
. - New patches for
phi3
phi4_multimodal
. - All patches have been sanity-tested for reliability.
Please make sure to install from our fork instead. We recommend using the provided script in the repo
:::yaml
python scripts/cutcrossentropy_install.py | sh
- Contributed by @NanoCode012 in #2813.
Support for PyTorch 2.5.1 is dropped as PyTorch 2.8.0 is slated to be released later this month. We recommend using torch==2.7.0 or 2.7.1.
Docker images now default to use torch 2.7.1 when using main-latest
tags.
vLLM is no longer included in Docker images for torch==2.6.0. This is due to vllm wheels using the incorrect ABI for 2.6.0 and the last supported version of vLLM for torch 2.6.0 is 0.8.5.post1. See https://github.com/vllm-project/vllm/issues/13608 for more details. Similarly, vLLM is only included in torch==2.7.0 as it is pinned to that particular version and 2.7.1 support is still in review
🎉 New features
Added Chunked cross entropy loss
We've introduced chunked_cross_entropy
as an alternative to the default trainer loss function. This can help reduce peak memory usage during training, especially for models with large vocabularies.
- Contributed by @winglian in #2625.
Added Support for Falcon-h1
You can now fine-tune models from the Falcon-h1 family. Run one of the example configs.
- Contributed by @younesbelkada in #2811.
Added Support for Devstral Small 2505
It is now possible to fine-tune Devstral models in Axolotl. Give it a try following our docs.
- Contributed by @NanoCode012 in #2880.
TiledMLP support
TiledMLP, authored by Arctic Long Sequence Training, reduces the activation footprint of long sequences in the MLP modules.
This currently only works with DeepSpeed Zero1 through Zero3. Single GPU, DDP, and FSDP aren't supported with this currently. Enable it via tiled_mlp: true
. Follow the linked PR for more info.
- Contributed by @winglian in #2811.
DenseMixer integration
DenseMixer is a MoE post-training method that improves router gradient estimation in MoE training. Read our docs learn more.
- Contributed by @winglian in #2868.
Flexible Evaluation Sequence Length
You can now set a different eval_sequence_len
in your config. This allows you to train with one sequence length but run evaluations on a longer or shorter one, providing more flexibility for testing model capabilities.
- Contributed by @winglian in #2836.
Improved Merge LoRA on CPU for DPO
--lora-on-cpu
flag now correctly moved LoRA adapters to CPU, even for DPO. This is useful for saving VRAM when merging LoRA adapters on machines with limited GPU memory.
- Contributed by @kallewoof in #2766.
Other Feature Enhancements
- Log Configuration on Startup: Axolotl now logs the full, resolved configuration at the start of every run, making it much easier to verify your settings. (by @djsaunde in #2819)
chat_template
kwargs: Restored the ability to pass additional arguments to your chat templates for more flexible formatting. (by @NanoCode012 in #2837)- Support Jinja2 template paths to
chat_template_jinja
and re-formatting string templates to files (by @winglian in #2795)
📦 Dependency Updates
flash-attn
upgraded to2.8.0.post2
. (by @winglian in #2828)accelerate
upgraded to1.8.1
andbitsandbytes
to0.46.0
. (by @winglian in #2815)mistral-common
upgraded to1.6.3
to fix multiprocessing pickling issues. (by @NanoCode012 in #2790)transformers
upgraded to4.53.1
. (by @winglian in #2844)
🔧 Major fixes
Reduced Startup Time for Sample Packing
Due to a regression in a prior PR, the trainer took longer to start if packing was enabled due to starting too many new processes.
- Contributed by @winglian in #2830.
Distributed Training Fixes
- DeepSpeed Initialization: Resolved an issue where DeepSpeed would fail to initialize correctly after a recent refactor. (by @djsaunde in #2820)
- Sequence Parallelism VRAM: Addressed a high VRAM usage issue when using Sequence Parallelism with RL trainers. (by @djsaunde in #2829)
- FSDP / Device Mesh: Ensured that device mesh patching is correctly applied for FSDP training. (by @djsaunde in #2842)
Iterable Dataset Fixes
Resolved critical bugs affecting iterable datasets, improving their stability and usability:
- Fixed pickling errors that would prevent training from resuming. (by @winglian in #2831)
- Fixed a failure during preprocessing when sampling from an iterable dataset. (by @winglian in #2825)
Tokenization Stall Fixes
Addressed tokenization stall with single long datasets that resulted in tokenization taking hours. (by @NanoCode012 in #2845)
General Stability Fixes
- Gemma3: Re-added the Gemma3 loss patch that was inadvertently removed, fixing training for these models. (#2817 by @NanoCode012)
- Train Sampler: Fixed a
'NoneType' object has no attribute 'column_names'
error that could occur with the train data sampler. (#2822 by @NanoCode012) - Packing: Added an assertion to the packing patch to prevent silent failures. (by @winglian in #2840)
Other Improvements
- fix: catch httperror from ratelimiting hf when checking user token by @NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2827
- chore: update pre-commit hooks by @github-actions in https://github.com/axolotl-ai-cloud/axolotl/pull/2821
- fix(doc): default messages example used wrong key by @NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2832
- feat: replace old colab notebook with newer one by @NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2838
- set a different triton cache for each test to avoid blocking writes to cache by @winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2843
- feat(doc): update docker tag examples by @NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2851
- fix nightlies to use correct cache by @winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2848
- build fa2 from source for base image with torch2.6 and cu124 by @winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2867
- respect shuffle_merged_datasets for single dataset too by @winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2866
- don't use tokenizer parallelism when using packing by @winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2862
- Fix: do not call preprocess in multimodal or pretraining case by @NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2861
- setup defaults for dataloader to ensure GPU is kept busy by @winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2632
- use latest version of cce fork for SP fix by @winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2871
- feat(doc): add vllm and fa2 incompat error to faq by @NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2877
- mark flaky geglu tests and add torch seed by @winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2876
- chore: update pre-commit hooks by @github-actions in https://github.com/axolotl-ai-cloud/axolotl/pull/2870
- Fix link in FSDP + QLoRA docs by @float-trip in https://github.com/axolotl-ai-cloud/axolotl/pull/2879
- fix: set add_generation_prompt to False when apply chat template for multimodal by @NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2859
- chore: update cce commit to include gemma3n fixes by @NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2881
- fix: set add_generation_prompt to False when apply chat template for multimodal by @NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2859
- Feat: add devstral model support by @NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2880
- add 2.7.0 torch images back to support vlllm by @winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2885
- Print slowest durations for tests by @SalmanMohammadi in https://github.com/axolotl-ai-cloud/axolotl/pull/2887
- fix xformers version by @winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2888
- release v0.11.0 by @winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2875
New Contributors
- @nyxkrage made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/2787
- @float-trip made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/2879
Full Changelog: https://github.com/axolotl-ai-cloud/axolotl/compare/v0.10.0...v0.11.0