A gradio web UI for running Large Language Models like LLaMA
Repo of Qwen2-Audio chat & pretrained large audio language model
Chat & pretrained large audio language model proposed by Alibaba Cloud
Robust Speech Recognition via Large-Scale Weak Supervision
LilyPond sheet music text editor
An open source RDP server
Speech-to-text, text-to-speech, and speaker recognition
A safe home for all your data
A free, open source, and extensible speech-to-text application
A gallery that showcases on-device ML/GenAI use cases
A deep learning toolkit for Text-to-Speech, battle-tested in research
Speech recognition module for Python
Comprehensive Gradio WebUI for audio processing
Transcribe any audio to text, translate and edit subtitles 100% locall
Remote desktop and file transfer tool
Label Studio is a multi-type data labeling and annotation tool
Anki is a smart spaced repetition flashcard program
High-quality multi-lingual text-to-speech library by MyShell.ai
Capable of understanding text, audio, vision, video
Examples and guides for using the Gemini API
The most powerful screen recorder & annotation tool for Chrome
Subtitle Creation Assistant
Audiocraft is a library for audio processing and generation
Open Source AI Dictation App