Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
0.2.3 source code.tar.gz | 2025-06-23 | 7.0 MB | |
0.2.3 source code.zip | 2025-06-23 | 7.1 MB | |
README.md | 2025-06-23 | 671 Bytes | |
Totals: 3 Items | 14.0 MB | 0 |
Guidance 0.2.3
We have a performance hotfix, and then we snuck in some extras.
Added
- Added Llama3.2 chat template
Removed
- Deleted some dead code, in particular sample_with_temperature from Engine classes
Changed
- Switched top-k (for widget) implementation to use a priority-queue instead of a full sort, saving a few milliseconds per token when widget/vis is turned on
Fixed
- Fix performance regression introduced in issue [#1261]: full logits history no longer cached, and fast-forwarded token probabilities are now only available (in widget) the first time they are added to the KV cache and will be missing otherwise.