Repo of Qwen2-Audio chat & pretrained large audio language model
SoTA open-source TTS
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)
Dia-1.6B generates lifelike English dialogue and vocal expressions