Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
koboldcpp.exe | 2025-04-21 | 507.8 MB | |
koboldcpp-mac-arm64 | 2025-04-21 | 26.9 MB | |
koboldcpp-linux-x64-nocuda | 2025-04-21 | 77.8 MB | |
koboldcpp-linux-x64-cuda1210 | 2025-04-21 | 701.2 MB | |
koboldcpp-linux-x64-cuda1150 | 2025-04-21 | 612.7 MB | |
koboldcpp_oldcpu.exe | 2025-04-21 | 507.9 MB | |
koboldcpp_nocuda.exe | 2025-04-21 | 77.1 MB | |
koboldcpp_cu12.exe | 2025-04-21 | 624.0 MB | |
koboldcpp-1.89 source code.tar.gz | 2025-04-20 | 27.7 MB | |
koboldcpp-1.89 source code.zip | 2025-04-20 | 28.1 MB | |
README.md | 2025-04-20 | 3.9 kB | |
Totals: 11 Items | 3.2 GB | 0 |
koboldcpp-1.89
https://github.com/user-attachments/assets/a2969fa0-e637-459d-918a-9eab056a0b94
- NEW: Improved NoScript mode - NoScript mode now has chat mode and image generation support entirely without Javascript! Access it by default at http://localhost:5001/noscript on your browser. Tested to work on Internet Explorer 5, Netscape Navigation 4, NetSurf, Lynx, Dillo, and basically any browser made after 1999.
- Added new launcher flags
--overridekv
and--overridetensors
which work in the same way as llama.cpp's flags. --overridekv
allows you to specify a single metadata property to be overwritten. Input format iskeyname=type:value
--overridetensors
allow you to place tensors matching a pattern onto a specific backend. Input format istensornamepattern=buffertype
- Enabled @jeffbolznv coopmat2 support for Vulkan (supports flash attention, overall slightly faster). CM2 is only enabled if you have the latest Nvidia Game Ready Driver (576.02) and should provide all round speedups. Thought the (OldCPU) Vulkan binaries will now exclude coopmat, coopmat2 and DP4A, so please use OldCPU mode if you encounter issues.
- Display available GPU memory when estimating layers
- Fixed RWKV model loading
- Added more sanity checks for Zenity, made YAD the default filepicker instead. If you still encounter issues, please select Legacy TK filepicker in the extras page, and report the issue.
- Minor fixes for stable UI inpainting brush selection.
- Enabled usage of Image Generation LoRAs even with a quantized diffusion model (the LoRA should still be unquantized)
- Fixed a crash when using certain image LoRAs due to graph size limits. Also reverted CLIP quant to f32 changes.
- CLI mode fixes
- Updated Kobold Lite, multiple fixes and improvements
- IMPORTANT: Relocated Tokens Tab and WebSearch Tab into Settings Panel (from context panel). Likewise, the regex and token sequence configs are now stored in settings rather than story (and will persist even on a new story).
- Fixed URLs not opening on new tab
- Reworked thinking tag handling - now separates display and submit regex behaviors (3 modes each)
- Added Retain History toggle for WebSearch to retain some old search results on subsequent queries.
- Added a editable Template for character creator (by @PeterPeet)
- Increased to 10 local and 10 remote save slots.
- Removed aetherroom club (dead site)
- Merged fixes and improvements from upstream
To use, download and run the koboldcpp.exe, which is a one-file pyinstaller. If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller. If you have an Nvidia GPU, but use an old CPU and koboldcpp.exe does not work, try koboldcpp_oldcpu.exe If you have a newer Nvidia GPU, you can use the CUDA 12 version koboldcpp_cu12.exe (much larger, slightly faster). If you're using Linux, select the appropriate Linux binary file instead (not exe). If you're on a modern MacOS (M1, M2, M3) you can try the koboldcpp-mac-arm64 MacOS binary. If you're using AMD, we recommend trying the Vulkan option (available in all releases) first, for best support. Alternatively, you can try koboldcpp_rocm at YellowRoseCx's fork here
Run it from the command line with the desired launch parameters (see --help
), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program from command line with the --help
flag. You can also refer to the readme and the wiki.