The CSM (Conversational Speech Model) is a speech generation model developed by Sesame AI that creates RVQ audio codes from text and audio inputs. It uses a Llama backbone and a smaller audio decoder to produce audio codes for realistic speech synthesis. The model has been fine-tuned for interactive voice demos and is hosted on platforms like Hugging Face for testing. CSM offers a flexible setup and is compatible with CUDA-enabled GPUs for efficient execution.
Features
- Generates high-quality speech from text and audio inputs.
- Uses a Llama backbone with an optimized audio decoder.
- Fine-tuned for interactive voice applications.
- Hosted models available for easy access and testing.
- Compatible with CUDA-enabled GPUs for fast performance.
- Easy to integrate and test using example scripts.
- Requires Python 3.10 and certain audio processing tools like ffmpeg.
- Customizable for various conversational contexts.
- Available under an Apache-2.0 license for open-source usage.
License
Apache License V2.0Follow CSM (Conversational Speech Model)
Other Useful Business Software
Run Any Workload on Compute Engine VMs
Compute Engine delivers high-performance virtual machines for web apps, databases, containers, and AI workloads. Choose from general-purpose, compute-optimized, or GPU/TPU-accelerated machine types—or build custom VMs to match your exact specs. With live migration and automatic failover, your workloads stay online. New customers get $300 in free credits.
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of CSM (Conversational Speech Model)!