Showing 122 open source projects for "voice to code"

View related business solutions
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 1
    Applio

    Applio

    A simple, high-quality voice conversion tool focused on ease of use

    Applio is a high-quality voice conversion toolkit designed to make modern RVC/VITS-based voice cloning accessible to non-experts. It focuses strongly on ease of use: installation scripts for Windows, Linux, and macOS set up dependencies and then launch a browser-based Gradio interface. Within that interface, users can train and run voice conversion models for tasks like singing conversion, speech-to-speech transformation, and voice cloning.
    Downloads: 67 This Week
    Last Update:
    See Project
  • 2
    OpenVoice

    OpenVoice

    Instant voice cloning by MIT and MyShell. Audio foundation model

    ...Architecturally, OpenVoice separates “tone color” cloning from style control, which makes it easier to keep a consistent identity while flexibly changing prosody or language. The project provides open-weight models, inference code, and examples, making it suitable both for research and for building production voice experiences. It is actively developed by MyShell, which also integrates OpenVoice into broader agent and entertainment workflows.
    Downloads: 27 This Week
    Last Update:
    See Project
  • 3
    MegaTTS 3

    MegaTTS 3

    Official PyTorch Implementation

    ...The system supports both Chinese and English (with code-switching), making it versatile across languages, and offers controls for accent strength, voice similarity, intelligibility vs. similarity tradeoffs, and other speech parameters to fine-tune output.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    Rhino

    Rhino

    On-device Speech-to-Intent engine powered by deep learning

    Rhino is Picovoice's Speech-to-Intent engine. It directly infers intent from spoken commands within a given context of interest, in real-time. The end-to-end platform for embedding private voice AI into any software in a few lines of code. Design with no limits on top of a modular platform. Create use-case-specific voice AI models in seconds. Develop voice features with a few lines of code using intuitive and cross-platform SDKs. Deliver voice AI everywhere: on-device, mobile, web browsers, on-premise, or cloud. Measure adoption, learn, and iterate. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    CosyVoice

    CosyVoice

    Multi-lingual large voice generation model, providing inference

    CosyVoice is a multilingual large voice generation model that offers a full-stack solution for training, inference, and deployment of high-quality TTS systems. The model supports multiple languages, including Chinese, English, Japanese, Korean, and a range of Chinese dialects such as Cantonese, Sichuanese, Shanghainese, Tianjinese, and Wuhanese. It is designed for zero-shot voice cloning and cross-lingual or mix-lingual scenarios, so a single reference voice can be used to synthesize speech across languages and in code-switching contexts. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    Happy Coder

    Happy Coder

    Mobile and Web client for Codex and Claude Code, with realtime voice

    Happy is an open-source, cross-platform mobile and web client designed to bring powerful AI coding agents such as Claude Code and Codex to your fingertips no matter where you are. At its core, Happy wraps existing AI coding tools with a unified interface, providing real-time voice interactions, encrypted communication, and seamless device switching between desktop and mobile. You can start a coding session locally through the Happy CLI or connect from a phone or browser, allowing developers to inspect, interact with, and guide the AI as it generates, tests, or explains code.
    Downloads: 21 This Week
    Last Update:
    See Project
  • 7
    Alan AI for Android

    Alan AI for Android

    Assistant SDK to build a multimodal conversational UX for Android

    Quickly add voice to your app with the Alan Platform. Create an in-app voice assistant to enable human-like conversations and provide a personalized voice experience for every user. Alan is a conversational voice AI platform that lets you create an intelligent voice assistant for your app. It offers all the necessary tools to design, embed, and host your voice solutions. A powerful web-based IDE where you can write, test and debug dialog scenarios for your voice assistant or chatbot. Alan's...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    peon-ping

    peon-ping

    Warcraft III Peon voice notifications (+ more!) for Claude Code

    Peon-ping is a quirky utility that brings fun and practical voice notifications to your development workflow by using Warcraft III peon-style sound effects whenever significant events occur in your code editor or terminal. The project is built around the idea of reducing cognitive load by audibly alerting you when processes finish, tests fail, or language models complete responses, helping you stay focused without constantly watching the screen.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 9
    PersonaPlex

    PersonaPlex

    PersonaPlex code

    PersonaPlex is an open-source real-time conversational speech AI model that goes beyond traditional text chat by providing full-duplex speech-to-speech interaction, meaning it can listen and talk at the same time instead of waiting for you to finish speaking before responding. This architectural approach eliminates awkward pauses and makes conversations feel much more human-like, with natural behaviors such as overlapping speech, interruptions, and fluent turn-taking, traits that traditional...
    Downloads: 2 This Week
    Last Update:
    See Project
  • Auth0 B2B Essentials: SSO, MFA, and RBAC Built In Icon
    Auth0 B2B Essentials: SSO, MFA, and RBAC Built In

    Unlimited organizations, 3 enterprise SSO connections, role-based access control, and pro MFA included. Dev and prod tenants out of the box.

    Auth0's B2B Essentials plan gives you everything you need to ship secure multi-tenant apps. Unlimited orgs, enterprise SSO, RBAC, audit log streaming, and higher auth and API limits included. Add on M2M tokens, enterprise MFA, or additional SSO connections as you scale.
    Sign Up Free
  • 10
    Spark TTS

    Spark TTS

    Spark-TTS Inference Code

    ...It uses an efficient single-stream architecture where speech tokens are directly reconstructed from the predictions of an LLM, removing the need for external acoustic models or complex vocoders and making the generation pipeline cleaner and faster. The project supports zero-shot voice cloning, meaning it can imitate a new speaker’s voice without dedicated training for that specific voice, and works across languages, including English and Chinese, even in cross-lingual code-switching scenarios. Spark-TTS allows users to control speech characteristics like gender, pitch, and speaking rate to customize synthesized output and support virtual speaker creation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    CallMe

    CallMe

    Minimal plugin that lets Claude Code call you on the phone

    ...The plugin uses a local MCP server alongside a webhook tunnel (typically via ngrok) to connect with voice providers like Telnyx or Twilio for outbound calls, with prompts and responses flowing between Claude Code and the user’s voice device. Multi-turn conversations are supported, so users can respond in real time to questions the agent asks during execution, giving it a practical human-in-the-loop capability for complex workflows.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Meshenger

    Meshenger

    P2P Voice/Video phone App for local networks

    Meshenger is an open-source, serverless P2P voice and video calling app for Android that works over local networks or directly between devices without the internet. It facilitates direct communication using QR codes or IP addresses, bypassing the need for any central infrastructure or account registration. Meshenger is particularly suited for emergency scenarios, privacy-focused users, and mesh networks where conventional communication tools fail. It emphasizes minimalism, transparency, and...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 13
    OpenAI.fm

    OpenAI.fm

    Code for openai.fm, a demo for the OpenAI Speech API

    OpenAI.fm is an official interactive demo application built to showcase the OpenAI Speech API and its advanced text-to-speech capabilities, providing developers and creators with a hands-on web interface to convert text into high-quality, customizable audio using state-of-the-art TTS models. Developed using Next.js and the OpenAI Speech API, this demo illustrates how the latest neural voice models can produce natural, expressive speech with adjustable styles and voices, highlighting features...
    Downloads: 25 This Week
    Last Update:
    See Project
  • 14
    Qwen2-Audio

    Qwen2-Audio

    Repo of Qwen2-Audio chat & pretrained large audio language model

    Qwen2-Audio is a large audio-language model by Alibaba Cloud, part of the Qwen series. It is trained to accept various audio signal inputs (including speech, sounds, etc.) and perform both voice chat and audio analysis, producing textual responses. It supports two major modes: Voice Chat (interactive voice only input) and Audio Analysis (audio + text instructions), with both base and instruction-tuned models. It is evaluated on many benchmarks (speech recognition, translation, sound classification, emotion, etc.), and offers pretrained models (e.g. 7B) released via ModelScope and Hugging Face. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    TEN

    TEN

    Open-source framework for conversational voice AI agents

    ...Using components like graph-based workflow design, drag-and-drop UI (via TMAN Designer), and reusable extensions such as real-time avatars, RAG (Retrieval-Augmented Generation), and image generation, TEN enables highly customizable, scalable agent development with minimal code.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 16
    OpenAI Realtime Agents

    OpenAI Realtime Agents

    This is a simple demonstration of more advanced, agentic patterns

    This repository demonstrates how to build low-latency, streaming “voice + chat” agents using OpenAI’s Realtime API combined with the OpenAI Agents SDK. The demo shows patterns for connecting a realtime voice stream (audio in/out) with agents that can use tools, maintain state, and orchestrate multi-agent workflows. The SDK offers abstractions such as agent orchestration, event handling, handoffs, state management, and guardrails, tailored to support realtime, conversational systems. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Big-AGI

    Big-AGI

    AI suite powered by state-of-the-art models and providing advanced AI

    ...It unifies access to multiple large language models (LLMs) and AI services through a modern web UI that emphasizes effi­cient interaction, flexibility, and extensibility, enabling users to conduct multi-model chats, execute code, generate images, and perform voice or text-based tasks all in one place. The workspace includes advanced features like Beam, which enables multi-model consensus and comparative responses to improve reliability and reduce hallucination, and robust persona management to tailor responses to specific roles or workflows. Big-AGI can be self-hosted or deployed in cloud environments, giving users full control over data and model access limits and avoiding vendor lock-in.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    Telegram Web A

    Telegram Web A

    Telegram Web A, GPL v3

    ...It uses a custom front-end framework (called “Teact”) that re-implements React-style paradigms and pairs them with a custom version of the MTProto library (based on GramJS) to interact with Telegram’s backend infrastructure. The project achieved recognition (winning first prize in the Telegram Lightweight Client Contest) and serves as the code base behind the official web client available at web.telegram.org/a. The architecture takes advantage of advanced browser capabilities: WebSockets for real-time messaging, Web Workers and WebAssembly for performance-critical tasks, multi-level caching and PWA features for offline or near-offline usability, voice recording and media streaming, raw binary data handling and cryptographic operations. ...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 19
    pyttsx3

    pyttsx3

    Offline Text To Speech synthesis for python

    ...On Windows it uses SAPI5, on Linux it typically uses eSpeak or eSpeak-NG, and on macOS it can use NSSpeechSynthesizer or AVSpeechSynthesizer, giving it broad cross-platform compatibility. The library exposes a simple but flexible API for controlling voice selection, speaking rate, volume, and other synthesis parameters from Python code. It supports both a high-level speak convenience function and a lower-level engine object with event hooks, queuing, and saving output to audio files. The repository includes examples and documentation that show how to adjust properties dynamically, persist synthesized output, and integrate pyttsx3 into GUIs or background services.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 20
    Aider

    Aider

    Aider is AI pair programming in your terminal

    Aider is an AI pair programming tool that runs directly in your terminal, helping developers build new projects or extend existing codebases faster and more confidently. It works alongside you like a coding partner, using powerful large language models to understand your code and implement precise changes. Aider creates a structured map of your entire repository, allowing it to handle large and complex projects effectively. It supports over 100 programming languages, making it flexible for...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 21
    SimpleX

    SimpleX

    The first messaging platform operating without user identifiers

    Other apps have user IDs: Signal, Matrix, Session, Briar, Jami, Cwtch, etc. SimpleX does not, not even random numbers. This radically improves your privacy. The video shows how you connect to your friend via their 1-time QR-code, in person or via a video link. You can also connect by sharing an invitation link. Temporary anonymous pairwise identifiers SimpleX uses temporary anonymous pairwise addresses and credentials for each user contact or group member. It allows to deliver messages...
    Downloads: 43 This Week
    Last Update:
    See Project
  • 22
    AI Runner

    AI Runner

    Offline inference engine for art, real-time voice conversations

    AI Runner is an offline inference engine designed to run a collection of AI workloads on your own machine, including image generation for art, real-time voice conversations, LLM-powered chatbots and automated workflows. It is implemented as a desktop-oriented Python application and emphasizes privacy and self-hosting, allowing users to work with text-to-speech, speech-to-text, text-to-image and multimodal models without sending data to external services. At the core of its LLM stack is a mode-based architecture with specialized “modes” such as Author, Code, Research, QA and General, and a workflow manager that automatically routes user requests to the right agent based on the task. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 23
    DiscordGo

    DiscordGo

    (Golang) Go bindings for Discord

    ...The DiscordGo code is fairly well documented at this point and is currently the only documentation available. Go reference (below) presents that information in a nice format. This library and the Discord API are unfinished. Because of that there may be major changes to library in the future.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Story Flicks

    Story Flicks

    Generate high-definition story short videos with one click using AI

    ...For creators who want to produce narrative short-form content — whether for social media, storytelling, or prototyping video ideas — story-flicks offers a lightweight, code-backed alternative to complex video editing suites. Because the project is open and modifiable, developers can customize the generation pipeline: adjust story structure, alter rendering parameters, tweak video quality or resolution, or integrate with other AI models (e.g. for audio, voice-over, or image-to-video). It’s especially useful as a starting template or experimentation ground for developers building automated content-creation tools.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 25
    Lumo Android App

    Lumo Android App

    Android application for Proton Lumo

    Lumo Android App is the official Android client implementation of Lumo, a privacy-first AI assistant created by Proton that lets users interact with an intelligent chatbot securely and confidentially on mobile devices. Lumo is designed so that every conversation remains encrypted and private, meaning chats are not logged, tracked, or used to train external large language models, and all interactions are protected with zero-access encryption so only the user can read them. The Android app...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB