Download Latest Version koboldcpp-1.96.2 source code.tar.gz (31.1 MB)
Email in envelope

Get an email when there's a new version of KoboldCpp

Home / v1.77
Name Modified Size InfoDownloads / Week
Parent folder
koboldcpp_nocuda.exe 2024-11-01 55.5 MB
koboldcpp_cu12.exe 2024-11-01 587.1 MB
koboldcpp.exe 2024-11-01 468.8 MB
koboldcpp-mac-arm64 2024-11-01 26.7 MB
koboldcpp-linux-x64-nocuda 2024-11-01 55.0 MB
koboldcpp-linux-x64-cuda1210 2024-11-01 661.2 MB
koboldcpp-linux-x64-cuda1150 2024-11-01 576.0 MB
koboldcpp_oldcpu.exe 2024-11-01 469.1 MB
koboldcpp-1.77 source code.tar.gz 2024-11-01 22.7 MB
koboldcpp-1.77 source code.zip 2024-11-01 23.2 MB
README.md 2024-11-01 4.3 kB
Totals: 11 Items   2.9 GB 0

koboldcpp-1.77

the road not taken edition

logprobs

  • NEW: Token Probabilities (logprobs) are now available over the API! Currently only supplied over the sync API (non-streaming), but a second /api/extra/last_logprobs dedicated logprobs endpoint is also provided. Will work and provide a link to view alternate token probabilities for both streaming and non-streaming if "logprobs" is enabled in KoboldAI Lite settings. Will also work in SillyTavern when streaming is disabled, once the latest build is out.
  • Response prompt_tokens, completion_tokens and total_tokens are now accurate values instead of placeholders.
  • Enabled CUDA graphs for the cuda12 build, which can improve performance on some cards.
  • Fixed a bug where .wav audio files uploaded directly to the /v1/audio/transcriptions endpoint get fragmented and cut off early. Audio sent as base64 within JSON payloads are unaffected.
  • Fixed a bug where Whisper transcription blocked generation in non-multiuser mode.
  • Fixed a bug where trim_stop did not remove a stop sequence that was divided across multiple tokens in some cases.
  • Significantly increased the maximum limits for stop sequences, anti-slop token bans, logit biases and DRY sequence breakers, (thanks to @mayaeary for the PR which changes the way some parameters are passed to the CPP side)
  • Added link to help page if user fails to select a model.
  • Flash Attention GUI quick launcher toggle hidden by default if Vulkan is selected (usually reduced performance).
  • Updated Kobold Lite, multiple fixes and improvements
  • NEW: Experimental ComfyUI Support Added!: ComfyUI can now be used as an image generation backend API from within KoboldAI Lite. No workflow customization is necessary. Note: ComfyUI must be launched with the flags --listen --enable-cors-header '*' to enable API access. Then you may use it normally like any other Image Gen backend.
  • Clarified the option for selecting A1111/Forge/KoboldCpp as an image gen backend, since Forge is gradually superseding A1111. This option is compatible with all 3 of the above.
  • You are now able to generate images from instruct mode via natural language, similar to chatgpt. (e.g. Please generate an image of a bag of sand). This option requires having an image model loaded, it uses regex and is enabled by default, it can be disabled in settings.
  • Added support for Tavern "V3" character cards: Actually, V3 is not a real format, it's an augmented V2 card used by Risu that adds additional metadata chunks. These chunks are not supported in Lite, but the base "V2" card functionality will work.
  • Added new scenario "Interactive Storywriter": This is similar to story writing mode, but allows you to secretly steer the story with hidden instruction prompts.
  • Added Token Probability Viewer - You can now see a table of alternative token probabilities in responses. Disabled by default, enable in advanced settings.
  • Fixed JSON file selection problems in some mobile browsers.
  • Fixed Aetherroom importer.
  • Minor Corpo UI layout tweaks by @Ace-Lite
  • Merged fixes and improvements from upstream

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller. If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller. If you have an Nvidia GPU, but use an old CPU and koboldcpp.exe does not work, try koboldcpp_oldcpu.exe If you have a newer Nvidia GPU, you can use the CUDA 12 version koboldcpp_cu12.exe (much larger, slightly faster). If you're using Linux, select the appropriate Linux binary file instead (not exe). If you're on a modern MacOS (M1, M2, M3) you can try the koboldcpp-mac-arm64 MacOS binary. If you're using AMD, you can try koboldcpp_rocm at YellowRoseCx's fork here

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI. and then once loaded, you can connect like this (or use the full koboldai client): http://localhost:5001

For more information, be sure to run the program from command line with the --help flag.

Source: README.md, updated 2024-11-01