Open-source multi-speaker long-form text-to-speech model
Qwen3-ASR is an open-source series of ASR models
Multi-modal large language model designed for audio understanding
Omnilingual ASR Open-Source Multilingual SpeechRecognition
SOTA discrete acoustic codec models with 40/75 tokens per second
An Open Source text-to-speech system built by inverting Whisper
48khz stereo neural audio codec for general audio
Audio foundation model excelling in audio understanding
AudioMuse-AI is an Open Source Dockerized environment
Open Source Speech Language Model
Python Audio Analysis Library: Feature Extraction, Classification
Spark-TTS Inference Code
kaldi-asr/kaldi is the official location of the Kaldi project
A PyTorch-based Speech Toolkit
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
A subtitle generator for Japanese Adult Videos.
VITS2 backbone with multilingual-bert
Headphone Correction and Spatial Audio on Headphones
Free, easy to use, lightweight soundboard for Windows
Open source software calculating industrial noise in the environment
Open source implementation of Microsoft's VALL-E X zero-shot TTS model
PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)
slab3d is a real-time virtual acoustic environment.