Showing 16 open source projects for "real-time vocoder"

View related business solutions
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 1
    Real-Time Voice Cloning

    Real-Time Voice Cloning

    Clone a voice in 5 seconds to generate arbitrary speech in real-time

    Real-Time Voice Cloning is an influential deep-learning repository that demonstrates how to clone a voice from just a few seconds of audio and then generate arbitrary speech in that voice in near real time. It implements the SV2TTS pipeline (“Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis”) in three stages: a speaker encoder, a synthesizer, and a vocoder.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 2
    WhisperSpeech

    WhisperSpeech

    An Open Source text-to-speech system built by inverting Whisper

    WhisperSpeech is an open-source text-to-speech system created by “inverting” OpenAI’s Whisper, reusing its strengths as a semantic audio model to generate speech instead of only transcribing it. The project aims to be for speech what Stable Diffusion is for images: powerful, hackable, and safe for commercial use, with code under Apache-2.0/MIT and models trained only on properly licensed data. Its architecture follows a token-based, multi-stage pipeline inspired by AudioLM and SPEAR-TTS:...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Parallel WaveGAN

    Parallel WaveGAN

    Unofficial Parallel WaveGAN

    Parallel WaveGAN is an unofficial PyTorch implementation of several state-of-the-art non-autoregressive neural vocoders, centered on Parallel WaveGAN but also including MelGAN, Multiband-MelGAN, HiFi-GAN, and StyleMelGAN. Its main goal is to provide a real-time neural vocoder that can turn mel spectrograms into high-quality speech audio efficiently. The repository is designed to work hand-in-hand with ESPnet-TTS and NVIDIA Tacotron2-style front ends, so you can build complete TTS or singing voice synthesis pipelines. It includes a large collection of “Kaldi-style” recipes for many datasets such as LJSpeech, LibriTTS, VCTK, JSUT, CMU Arctic, and multiple singing voice corpora in Japanese, Mandarin, Korean, and more. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Nyquist

    Nyquist

    Nyquist is a language for sound synthesis and music composition.

    Nyquist is a language for sound synthesis and music composition. It is implemented in C and C++ and runs on Win32, OSX, and Linux. Nyquist combines a powerful functional programming style with efficient signal-processing primitives. Nyquist is also embedded as a scripting language in Audacity.
    Leader badge
    Downloads: 36 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 5
    Lyra

    Lyra

    A Very Low-Bitrate Codec for Speech Compression

    lyra is a neural audio codec designed to deliver intelligible, natural-sounding speech at extremely low bitrates, making real-time communication viable on constrained networks. It replaces hand-engineered codecs with learned models that capture speech characteristics more efficiently and reconstruct waveforms with a neural vocoder. The system targets mobile-class hardware, balancing latency and quality so it can run in real-time on phones. Its architecture is resilient to packet loss and jitter through framing strategies and error concealment, helping conversations remain understandable under adverse conditions. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    radio_vocoder_FFT

    radio_vocoder_FFT

    a vocoder + equalizer + FFT effects version of radio_chung

    radio vocoder chung is a vocoder + linear equalizer(s) + FFT effect(s) version of radio chung free internet web radio stream url and audio file generic path player ( * ,mp3,*name*.ogg,wav,...) with dsp(s) (baxandall , resonance , automod , decay , flat , noisered , speed , feedback ) using bass.dll , gui_chung , FFTdll.dll fft fast fourier transform and freebasic .high quality small pitch shift shifting for radio url . added record, playrec, save as MP3 , feedback , anticlick .
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Mocking Bird

    Mocking Bird

    Clone a voice in 5 seconds to generate arbitrary speech in real-time

    MockingBird is an open-source voice cloning and real-time speech generation toolkit that lets you clone a speaker’s voice from a short audio sample (reportedly as little as 5 seconds) and then synthesize arbitrary speech in that voice. It builds on deep-learning based TTS / voice-cloning technology (in the lineage of projects such as Real-Time-Voice-Cloning), but extends it with support for Mandarin Chinese and multiple Chinese speech datasets — broadening its applicability beyond English. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    VoiceFixer

    VoiceFixer

    General Speech Restoration

    VoiceFixer is a machine-learning framework for “speech restoration”: given a degraded or distorted audio recording — with noise, clipping, low sampling rate, reverberation, or other artifacts — it attempts to recover high-fidelity, clean speech. The architecture works in two stages: first an analysis stage that tries to extract “clean” intermediate features from the noisy audio (e.g. removing noise, denoising, dereverberation, upsampling), and then a neural vocoder-based synthesis stage that...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 9
    TensorFlowTTS

    TensorFlowTTS

    Real-Time State-of-the-art Speech Synthesis for Tensorflow 2

    ...It offers a variety of architectures for text-to-speech, including classic and modern models such as Tacotron‑2, FastSpeech / FastSpeech2, and neural vocoders like MelGAN and Multiband‑MelGAN. Because it’s based on TensorFlow 2, it can leverage optimizations such as fake-quantization aware training and pruning — which allow models to run faster than real time and to be deployable on mobile or embedded platforms. The library supports multiple languages (English, French, Korean, Chinese, German, etc.) and is relatively easy to adapt to new languages. With integrated vocoder + mel-spectrogram generation pipelines, pre-trained models, and fairly flexible architecture, TensorFlowTTS is a great off-the-shelf and extensible TTS engine for applications ranging from voice assistants to content generation or accessibility tools.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 10
    HiFi-GAN

    HiFi-GAN

    Generative Adversarial Networks for Efficient and High Fidelity Speech

    ...In experiments on LJSpeech, HiFi-GAN was shown to achieve mean opinion scores close to human recordings while synthesizing 22.05 kHz audio up to ~168× faster than real time on an NVIDIA V100 GPU. A smaller configuration trades a bit of quality for even higher speed and can run more than 13× faster than real time on CPU, making it suitable for deployment scenarios without powerful GPUs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    phasevocoder

    phasevocoder

    phase vocoder for time scaling and pitch transposition etc.

    phasevocoder: phase vocoder for time scaling and pitch transposition etc. Copyright (c) 2008-2020 by Klaus Michael Indlekofer. All rights reserved. Note: Special restrictions apply. See disclaimers below and within the distribution. (We are not affiliated in any way with companies/persons mentioned on this page. All brand names and trademarks are property of their respective owners.)
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Tacotron-2

    Tacotron-2

    DeepMind's Tacotron-2 Tensorflow implementation

    ...It includes directory layouts and logging directories for multiple datasets such as LJSpeech and M-AILABS en_US/en_UK, making it easier to adapt to new English corpora. Separate log trees track mel-spectrograms, attention plots, evaluation audio, and vocoder outputs, so you can inspect how alignment and audio quality evolve over time.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    WaoN is a Wave-to-Notes transcriber (converts audio file into midi file) and some utility tools such as gWaoN, graphical visualization of the spectra, and phase vocoder for time-stretching and pitch-shifting.
    Leader badge
    Downloads: 13 This Week
    Last Update:
    See Project
  • 14
    A configurable, real-time vocoder using ALSA.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Vintage Vocoder real-time audio effect - VST and DXI plug-in for PC/MAC. Originally a commercial product published by Sonicism Digital Audio Solutions in 2002. This software was used for the robot voices and sound effects in the computer game Freelancer.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 16
    Sculptor is a phase-vocoder-based package with real-time capabilites. You can use it to fiddle with soundfiles in the frequency domain, changing pitch and duration independently.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB