Besides the usual FP32, it supports FP16, quantized INT4, INT5 and INT8 inference. This project is focused on CPU, but cuBLAS is also supported. RWKV is a novel large language model architecture, with the largest model in the family having 14B parameters. In contrast to Transformer with O(n^2) attention, RWKV requires only state from the previous step to calculate logits. This makes RWKV very CPU-friendly on large context lengths.
Features
- Windows / Linux / MacOS
- Build the library yourself
- Get an RWKV model
- Requirements: Python 3.x with PyTorch and tokenizers
- ggml moves fast, and can occasionally break compatibility with older file formats
- Requirements: Python 3.x with PyTorch
License
MIT LicenseFollow rwkv.cpp
Other Useful Business Software
Ship AI Apps Faster with Vertex AI
Ship AI apps and features faster with Vertex AI—your end-to-end AI platform. Access Gemini 3 and 200+ foundation models, fine-tune for your needs, and deploy with enterprise-grade MLOps. Build chatbots, agents, or custom models. New customers get $300 in free credit.
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of rwkv.cpp!