Showing 624 open source projects for "audio synthesis"

View related business solutions
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 1
    Step-Audio

    Step-Audio

    Open-source framework for intelligent speech interaction

    Step-Audio is a unified, open-source framework aimed at building intelligent speech systems that combine both comprehension and generation: it integrates large language models (LLMs) with speech input/output to handle not only semantic understanding but also rich vocal characteristics like tone, style, dialect, emotion, and prosody. The design moves beyond traditional separate-component pipelines (ASR → text model → TTS), instead offering a multimodal model that ingests speech or audio and produces speech accordingly, enabling natural dialogue, voice cloning, and expressive speech synthesis. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    MLX-Audio

    MLX-Audio

    A text-to-speech, speech-to-text and speech-to-speech library

    ...The project provides a straightforward CLI (mlx_audio.tts.generate) as well as a Python API for programmatic generation of audio, including parameters for voice choice, speed, language hints, output format, and sample rate. It includes examples such as audiobook generation to demonstrate long-form synthesis and joined audio segments. On top of that, MLX-Audio offers a modern web interface powered by FastAPI, with real-time waveform and 3D visualizations, file upload, and audio management.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 3
    SuperCollider

    SuperCollider

    Audio server, programming language, and IDE for sound synthesis

    SuperCollider is a platform for audio synthesis and algorithmic composition, used by musicians, artists, and researchers working with sound. It is free and open source software available for Windows, macOS, and Linux. scsynth, a real-time audio server, forms the core of the platform. It features 400+ unit generators (“UGens”) for analysis, synthesis, and processing.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Fun Audio Chat

    Fun Audio Chat

    Large Audio Language Model built for natural interactions

    Fun Audio Chat is an interactive voice-first conversational AI platform designed to let users engage in natural spoken dialogue with large language models in real time, turning speech into context-aware responses while maintaining a smooth back-and-forth experience. It combines speech recognition, audio processing, and AI generation so users can speak simply and receive spoken replies, enabling applications such as virtual assistants, voice bots, and hands-free chat interfaces. The system...
    Downloads: 1 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    Step-Audio 2

    Step-Audio 2

    Multi-modal large language model designed for audio understanding

    Step-Audio2 is an advanced, end-to-end multimodal large language model designed for high-fidelity audio understanding and natural speech conversation: unlike many pipelines that separate speech recognition, processing, and synthesis, Step-Audio2 processes raw audio, reasons about semantic and paralinguistic content (like emotion, speaker characteristics, non-verbal cues), and can generate contextually appropriate responses — including potentially generating or transforming audio output. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Voicebox

    Voicebox

    The open-source voice synthesis studio powered by Qwen3-TTS

    Voicebox is a local-first voice synthesis studio that aims to bring professional, DAW-like voice generation workflows to a desktop app while keeping models and voice data entirely on your machine. It positions itself as an open-source alternative to cloud voice platforms by emphasizing privacy, offline use, and freedom from subscriptions or usage caps. The tool supports downloading voice models, cloning voices from short audio samples, and generating speech locally, then organizing the results using studio-oriented editing concepts. ...
    Downloads: 77 This Week
    Last Update:
    See Project
  • 7
    FluidSynth

    FluidSynth

    Software synthesizer based on the SoundFont 2 specifications

    FluidSynth is a real-time software synthesizer based on the SoundFont 2 specifications and has reached widespread distribution. FluidSynth itself does not have a graphical user interface, but due to its powerful API several applications utilize it and it has even found its way onto embedded systems and is used in some mobile apps.
    Downloads: 52 This Week
    Last Update:
    See Project
  • 8
    Sonic Pi

    Sonic Pi

    Sonic Pi is your free code-based music creation and performance tool

    Sonic Pi is a new kind of musical instrument. Instead of strumming strings or whacking things with sticks - you write code, live. Sonic Pi is a complete open source programming environment originally designed to explore and teach programming concepts within schools through the process of creating new sounds. In addition to being an engaging education resource it has evolved into an extremely powerful and performance-ready live coding instrument suitable for professional artists and DJs....
    Downloads: 23 This Week
    Last Update:
    See Project
  • 9
    Overtone

    Overtone

    Collaborative programmable music

    Overtone is an open-source audio environment designed to explore new musical ideas from synthesis and sampling to instrument building, live coding and collaborative jamming. We combine the powerful SuperCollider audio engine, with Clojure, a state-of-the-art lisp, to create an intoxicating interactive sonic experience. Synchronize your visuals and noise with ease.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Catch Bugs Before Your Customers Do Icon
    Catch Bugs Before Your Customers Do

    Real-time error alerts, performance insights, and anomaly detection across your full stack. Free 30-day trial.

    Move from alert to fix before users notice. AppSignal monitors errors, performance bottlenecks, host health, and uptime—all from one dashboard. Instant notifications on deployments, anomaly triggers for memory spikes or error surges, and seamless log management. Works out of the box with Rails, Django, Express, Phoenix, Next.js, and dozens more. Starts at $23/month with no hidden fees.
    Try AppSignal Free
  • 10
    Hugging Face - Speech To Speech

    Hugging Face - Speech To Speech

    Open speech-to-speech models and pipelines by Hugging Face toolkit AI

    This project from Hugging Face focuses on enabling direct speech-to-speech processing using modern machine learning models. It provides tools and reference implementations that allow audio input to be transformed into audio output without requiring an intermediate text representation. Hugging Face - Speech To Speech builds on recent advances in speech modeling, combining components such as speech recognition, translation, and synthesis into unified pipelines. It is designed to help researchers and developers experiment with multilingual and cross-lingual voice applications. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    Faust

    Faust

    Functional programming language for signal processing

    Faust (Functional Audio Stream) is a functional programming language for sound synthesis and audio processing with a strong focus on the design of synthesizers, musical instruments, audio effects, etc. Faust targets high-performance signal processing applications and audio plug-ins for a variety of platforms and standards. The core component of Faust is its compiler.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    RHVoice

    RHVoice

    Free open source speech synthesizer for Russian and other languages

    RHVoice is a free and open-source multilingual speech synthesizer. Its developers hope to give more visually impaired people the ability to use a good free synthesis voice reading in their native language with their screen reader. We are especially interested in supporting those languages for which there are currently no good voices that could be used with a screen reader. The creator of RHVoice, Olga Yakovleva, is blind herself. Many of the contributors to the RHVoice project, both...
    Downloads: 46 This Week
    Last Update:
    See Project
  • 13
    Pipecat

    Pipecat

    Framework for building real-time voice and multimodal AI agents

    Pipecat is an open source Python framework designed for building real-time voice and multimodal conversational AI agents. It provides developers with tools to orchestrate complex pipelines that combine speech recognition, language models, audio processing, and speech synthesis into a cohesive conversational system. Pipecat focuses on low-latency interactions so voice conversations with AI feel natural and responsive during live use. Pipecat allows applications to integrate multiple AI services and transports, enabling flexible deployment across different environments and communication channels. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 14
    TADA

    TADA

    Open Source Speech Language Model

    ...This approach can support applications such as conversational AI, speech synthesis, multimodal language modeling, and speech understanding systems. The project explores ways to treat speech and text as integrated data streams rather than separate pipelines, enabling more coherent interactions between language and audio. Because it operates as a generative framework, TADA can be used for research into advanced speech-language models and multimodal artificial intelligence systems.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    GLM-TTS

    GLM-TTS

    Controllable & emotion-expressive zero-shot TTS

    GLM-TTS is an advanced text-to-speech synthesis system built on large language model technologies that focuses on producing high-quality, expressive, and controllable spoken output, including features like emotion modulation and zero-shot voice cloning. It uses a two-stage architecture where a generative LLM first converts text into intermediate speech token sequences and then a Flow-based neural model converts those tokens into natural audio waveforms, enabling rich prosody and voice character even for unseen speakers. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Qwen2.5-Omni

    Qwen2.5-Omni

    Capable of understanding text, audio, vision, video

    ...Very strong benchmark performance across modalities (audio understanding, speech recognition, image/video reasoning) and often outperforming or matching single-modality models at a similar scale. Real-time streaming responses, including natural speech synthesis (text-to-speech) and chunked inputs for low latency interaction.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    Riffusion App

    Riffusion App

    Stable diffusion for real-time music generation (web app)

    Riffusion App Hobby is an open-source interactive web application that enables real-time music generation using stable diffusion models adapted for audio synthesis. Unlike traditional music generation tools, it treats audio as spectrogram images and applies diffusion techniques to generate continuous sound transitions, allowing users to create evolving musical loops and compositions. The application is built with modern web technologies including Next.js, React, and three.js, providing a responsive and visually engaging interface for experimentation. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 18
    Podcastfy.ai

    Podcastfy.ai

    Transforming Multimodal Content into Captivating Multilingual Audio

    Podcastfy is an open-source Python package that transforms multi-modal content (text, images) into engaging, multi-lingual audio conversations using GenAI. Input content includes websites, PDFs, youtube videos as well as images. Unlike UI-based tools focused primarily on note-taking or research synthesis (e.g. NotebookLM), Podcastfy focuses on the programmatic and bespoke generation of engaging, conversational transcripts and audio from a multitude of multi-modal sources enabling customization and scale.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    pyVideoTrans

    pyVideoTrans

    Translate the video from one language to another and embed dubbing

    pyVideoTrans is an ambitious open-source multimedia processing project that assembles speech recognition, subtitle generation, AI translation, voice synthesis, and video assembly into a unified pipeline for converting videos from one language to another with embedded dubbing and captions. At its core it runs speech-to-text models to transcribe audio tracks, translates the resulting text into a target language using local or cloud-based translation engines, synthesizes new speech to match the translated subtitles, and then merges that speech back into the video, creating a fully localized media file. ...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 20
    IndexTTS2

    IndexTTS2

    Industrial-level controllable zero-shot text-to-speech system

    IndexTTS is a modern, zero-shot text-to-speech (TTS) system engineered to deliver high-quality, natural-sounding speech synthesis with few requirements and strong voice-cloning capabilities. It builds on state-of-the-art models such as XTTS and other modern neural TTS backbones, improving them with a conformer-based speech conditional encoder and upgrading the decoder to a high-quality vocoder (BigVGAN2), leading to clearer and more natural audio output.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 21
    pyttsx3

    pyttsx3

    Offline Text To Speech synthesis for python

    ...On Windows it uses SAPI5, on Linux it typically uses eSpeak or eSpeak-NG, and on macOS it can use NSSpeechSynthesizer or AVSpeechSynthesizer, giving it broad cross-platform compatibility. The library exposes a simple but flexible API for controlling voice selection, speaking rate, volume, and other synthesis parameters from Python code. It supports both a high-level speak convenience function and a lower-level engine object with event hooks, queuing, and saving output to audio files. The repository includes examples and documentation that show how to adjust properties dynamically, persist synthesized output, and integrate pyttsx3 into GUIs or background services.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 22
    AudioKit

    AudioKit

    Swift audio synthesis, processing, & analysis platform

    AudioKit is an entire audio development ecosystem of code repositories, packages, libraries, algorithms, applications, playgorunds, tests, and scripts, built and used by a community of audio programmers, app developers, engineers, researchers, scientists, musicians, gamers, and people new to programming. An important goal for AudioKit is to allow it to grow and be maintainable by a handful of volunteers. For this reason we have extensive tests that are run whenever changes are made to any...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 23
    HY-World 1.5

    HY-World 1.5

    A Systematic Framework for Interactive World Modeling

    ...It blends advanced reasoning with multimodal synthesis, enabling agents to describe scenes, generate context-appropriate responses, and contribute to narrative or gameplay flows. The underlying framework typically supports large-context state tracking across extended interactions, blending temporal and spatial multimodal signals.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 24
    SoniTranslate

    SoniTranslate

    Synchronized Translation for Videos

    SoniTranslate is a video translation and dubbing system that produces synchronized target-language audio tracks for existing video content. It provides a web UI built with Gradio, allowing users to upload a video, choose source and target languages, and then run a pipeline that handles transcription, translation and re-synthesis of speech. Under the hood, it uses advanced speech and diarization models to separate speakers, align audio with timecodes and respect subtitle timing, which lets the generated dub track stay in sync with the original video structure. ...
    Downloads: 30 This Week
    Last Update:
    See Project
  • 25
    bfxr

    bfxr

    Flash + AIR sound effects generator. Based on Sfxr.

    The bfxr project by increpare is a sound-effects generator tool originally built using Flash + AIR, based on the earlier Sfxr project. Its purpose is to enable users, especially game developers and sound designers, to quickly generate retro, 8-bit/“chiptune” style sound effects (“bleeps”, “booms”, “zaps”, etc.) without deep knowledge of audio signal processing. It offers an interactive GUI through which you can tweak many parameters (oscillators, envelopes, filters, etc.) to sculpt custom...
    Downloads: 10 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB