Fast and accurate automatic speech recognition (ASR) for edge devices
GLM-4-Voice | End-to-End Chinese-English Conversational Model
Speech to Text to Speech, sends text as OSC messages
Robust Speech Recognition via Large-Scale Weak Supervision
Speech-to-text, text-to-speech, and speaker recognition
Framework for building real-time voice and multimodal AI agents
The behavior guidance framework for customer-facing LLM agents
Realtime AI Voice Agents with SoTA Multimodal AI models on Arduino ESP
Real-time voice interactive digital human
Open source AI VTuber platform with voice chat and Live2D avatars
Large Audio Language Model built for natural interactions
Build voice-based LLM agents. Modular + open source
In-App assistant SDK to build a multimodal conversational UX websites
Conversational voice AI agents
Repo of Qwen2-Audio chat & pretrained large audio language model
Fast multimodal LLM for real-time voice interaction and AI apps
Map location picker component for Android
TEN, a voice agent framework to create conversational AI.
A free, open source, and extensible speech-to-text application
Multilingual speech recognition and audio understanding model
Assistant SDK to build a multimodal conversational UX for Android
In-App assistant SDK to build a multimodal conversational UX for iOS
Build your own AI friend
Bailing is a voice dialogue robot similar to GPT-4o
Deploy your private Gemini application for free with one click