Speech-to-text, text-to-speech, and speaker recognition
ComfyUI integration for Microsoft's VibeVoice text-to-speech model
Long-form streaming TTS system for multi-speaker dialogue generation
Interface for OuteTTS models
Collaborative document editing using Markdown
Clone a voice in 5 seconds to generate arbitrary speech in real-time
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
Towards Human-Level Text-to-Speech through Style Diffusion
Open-source multi-speaker long-form text-to-speech model
Sempare Template (scripting) Engine for Delphi
Recognition and resolution of numbers, units, date/time, etc.
A generative speech model for daily dialogue
A Web UI for easy subtitle using whisper model
Official PyTorch Implementation
Text generator is a handy plugin for Obsidian
High-performance inference server for text embeddings models API layer
A text editor in less than 1000 LOC with syntax highlight and search
Completely customizable framework for building rich text editors
Self-hosted AI audio transcription
A playground to generate images from any text prompt using SD
Provides line-oriented text file editing capabilities
Translate the video from one language to another and embed dubbing
Oobabooga - The definitive Web UI for local AI, with powerful features
An Open Source implementation of Notebook LM with more flexibility
Hypernetworks that adapt LLMs for specific benchmark tasks