text to audio free download

Showing 630 open source projects for "text to audio"

View related business solutions

Level Up Your Cyber Defense with External Threat Management
See every risk before it hits. From exposed data to dark web chatter. All in one unified view.

Move beyond alerts. Gain full visibility, context, and control over your external attack surface to stay ahead of every threat.

Try for Free
Gen AI apps are built with MongoDB Atlas
The database for AI-powered applications.

MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.

Start Free
1

Text Generation Web UI

A gradio web UI for running Large Language Models like LLaMA

A gradio web UI for running Large Language Models like LLaMA, llama.cpp, GPT-J, Pythia, OPT, and GALACTICA. Dropdown menu for switching between models. Notebook mode that resembles OpenAI's playground. Chat mode for conversation and role playing. Instruct mode compatible with Alpaca and Open Assistant formats. Nice HTML output for GPT-4chan. Markdown output for GALACTICA, including LaTeX rendering. Custom chat characters. Advanced chat features (send images, get audio responses with TTS). Very...

Downloads: 14 This Week

Last Update: 2025-10-23
See Project
2

Qwen2-Audio

Repo of Qwen2-Audio chat & pretrained large audio language model

Qwen2-Audio is a large audio-language model by Alibaba Cloud, part of the Qwen series. It is trained to accept various audio signal inputs (including speech, sounds, etc.) and perform both voice chat and audio analysis, producing textual responses. It supports two major modes: Voice Chat (interactive voice only input) and Audio Analysis (audio + text instructions), with both base and instruction-tuned models. It is evaluated on many benchmarks (speech recognition, translation, sound...

Downloads: 4 This Week

Last Update: 2025-09-23
See Project
3

Qwen-Audio

Chat & pretrained large audio language model proposed by Alibaba Cloud

Qwen-Audio is a large audio-language model developed by Alibaba Cloud, built to accept various types of audio input (speech, natural sounds, music, singing) along with text input, and output text. There is also an instruction-tuned version called Qwen-Audio-Chat which supports conversational interaction (multi-round), audio + text input, creative tasks and reasoning over audio. It uses multi-task training over many different audio tasks (30+), and achieves strong multi-benchmarks performance...

Downloads: 2 This Week

Last Update: 2025-09-23
See Project
4

Audio/Video_ToText_OpenAI_Whisper

Transcripciones con Whisper Esta aplicación de escritorio basada en web permite transcribir (o transcribir y traducir al ingles), archivos de audio o video utilizando el modelo Whisper de OpenAI. Transcriptions with Whisper This web-based desktop application allows you to transcribe—or both transcribe and translate into English—audio or video files using OpenAI's Whisper model.

Downloads: 2 This Week

Last Update: 2025-04-11
See Project
Build Securely on AWS with Proven Frameworks
Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.

Download Now
5

Whisper

Robust Speech Recognition via Large-Scale Weak Supervision

OpenAI Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification. A Transformer sequence-to-sequence model is trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. These tasks are jointly represented...

Downloads: 73 This Week

Last Update: 2025-06-26
See Project
6

Frescobaldi

LilyPond sheet music text editor

Frescobaldi is a free and open source LilyPond sheet music text editor. Designed to be powerful yet lightweight and easy-to-use, Frescobaldi offers great functionality and a host of useful features such as music view with advanced two-way Point & Click, Midi capturing to enter music, a Snippet Manager and many more. Frescobaldi is named after Girolamo Frescobaldi (1583-1643), an Italian composer of keyboard music in the late Renaissance and early Baroque period.

2 Reviews

Downloads: 45 This Week

Last Update: 2025-08-08
See Project
7

xrdp

An open source RDP server

xrdp provides a graphical login to remote machines using RDP (Microsoft Remote Desktop Protocol). xrdp accepts connections from a variety of RDP clients: FreeRDP, rdesktop, NeutrinoRDP and Microsoft Remote Desktop Client (for Windows, macOS, iOS and Android). As Windows-to-Windows Remote Desktop can, xrdp supports not only graphics remoting but also two-way clipboard transfer (text, bitmap, file), audio redirection, drive redirection (mount local client drives on a remote machine). Connect...

Downloads: 59 This Week

Last Update: 2025-07-07
See Project
8

sherpa-onnx

Speech-to-text, text-to-speech, and speaker recognition

Speech-to-text, text-to-speech, and speaker recognition using next-gen Kaldi with onnxruntime without an Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter.

Downloads: 26 This Week

Last Update: 2025-10-24
See Project
9

Nextcloud Server

A safe home for all your data

Nextcloud server is a free and open source server software that allows you to store all of your data in a server of your choosing. With Nextcloud you can easily access and store data in the data center you trust, sync data among various devices, and share your data for collaboration purposes. It offers the best security in the self hosted file sync and share world, and is expandable with hundreds of apps.

Downloads: 45 This Week

Last Update: 2025-10-23
See Project
Get the most trusted enterprise browser
Advanced built-in security helps IT prevent breaches before they happen

Defend against security incidents with Chrome Enterprise. Create customizable controls, manage extensions and set proactive alerts to keep your data and employees protected without slowing down productivity.

Download Chrome
10

Handy STT

A free, open source, and extensible speech-to-text application

Handy is a free, open-source, offline speech-to-text application built for privacy, accessibility, and extensibility. Developed using Tauri (Rust + React/TypeScript), it runs natively across Windows, macOS, and Linux while performing local speech recognition without sending any audio to cloud servers. Handy allows users to start transcription instantly using a configurable keyboard shortcut—press to record, release to transcribe—and automatically pastes the resulting text into any active text...

Downloads: 20 This Week

Last Update: 2025-10-18
See Project
11

Google AI Edge Gallery

A gallery that showcases on-device ML/GenAI use cases

Gallery is a curated collection of on-device machine learning examples, demo apps, and model artifacts designed to help developers experiment with and deploy ML at the edge. The project bundles runnable samples that show how to run TensorFlow Lite/Edge TPU models (and similar lightweight runtimes) on mobile and embedded platforms, demonstrating common tasks like image classification, object detection, audio recognition, and pose estimation. Each sample is intended to be both a learning aid...

Downloads: 23 This Week

Last Update: 2025-09-19
See Project
12

Coqui TTS

A deep learning toolkit for Text-to-Speech, battle-tested in research

TTS is a library for advanced Text-to-Speech generation. It's built on the latest research, was designed to achieve the best trade-off among ease-of-training, speed and quality. TTS comes with pre-trained models, tools for measuring dataset quality and is already used in 20+ languages for products and research projects. High-performance Deep Learning models for Text2Speech tasks. Text2Spec models (Tacotron, Tacotron2, Glow-TTS, SpeedySpeech). Speaker Encoder to compute speaker embeddings...

Downloads: 21 This Week

Last Update: 2023-12-12
See Project
13

SpeechRecognition

Speech recognition module for Python

Library for performing speech recognition, with support for several engines and APIs, online and offline. Recognize speech input from the microphone, transcribe an audio file, save audio data to an audio file. Show extended recognition results, calibrate the recognizer energy threshold for ambient noise levels (see recognizer_instance.energy_threshold for details). Listening to a microphone in the background, various other useful recognizer features. The easiest way to install this is using pip...

Downloads: 12 This Week

Last Update: 2025-05-12
See Project
14

Voice-Pro

Comprehensive Gradio WebUI for audio processing

Voice-Pro is the best gradio WebUI for transcription, translation and text-to-speech. It can be easily installed with one click. Create a virtual environment using Miniconda, running completely separate from the Windows system (fully portable). Supports real-time transcription and translation, as well as batch mode.

1 Review

Downloads: 10 This Week

Last Update: 2025-10-05
See Project
15

Whishper

Transcribe any audio to text, translate and edit subtitles 100% locall

Open-source, local-first audio transcription and subtitling suite with a simple web UI. Thanks to open-source technologies, Whishper can run 100% offline. Your data never leaves your computer. Whishper allows you to translate your transcriptions to and from more than 60 languages thanks to Argos Translate and LibreTranslate. Download the transcriptions in many formats (json, txt, vtt, srt). Easily edit your subtitles right in the Web-UI.

Downloads: 9 This Week

Last Update: 2024-09-10
See Project
16

Aspia

Remote desktop and file transfer tool

Free open-source application for real-time desktop remote control and file transfer. With Aspia, you can create your own NAT traversal infrastructure (using Router and Relay servers) with connection by ID or use direct connections. Aspia supports many features. Among them, detailed information about the system, task manager, audio, and text chat. It is safe. All transmitted data is encrypted. Add computers for quick connection, and create computer groups. Encryption of address books...

Downloads: 17 This Week

Last Update: 2024-07-03
See Project
17

Label Studio

Label Studio is a multi-type data labeling and annotation tool

The most flexible data annotation tool. Quickly installable. Build custom UIs or use pre-built labeling templates. Detect objects on image, bboxes, polygons, circular, and keypoints supported. Partition image into multiple segments. Use ML models to pre-label and optimize the process. Label Studio is an open-source data labeling tool. It lets you label data types like audio, text, images, videos, and time series with a simple and straightforward UI and export to various model formats. It can...

Downloads: 13 This Week

Last Update: 2025-09-30
See Project
18

Anki

Anki is a smart spaced repetition flashcard program

Anki is a free, open-source spaced repetition flashcard application designed for efficient long‑term memorization. It supports a wide variety of media types (text, images, audio, LaTeX), advanced scheduling algorithms (SM‑2, FSRS), and extensibility via add‑ons. It’s widely used for education, language learning, medical training, and more.

Downloads: 13 This Week

Last Update: 2025-09-17
See Project
19

MeloTTS

High-quality multi-lingual text-to-speech library by MyShell.ai

MeloTTS is an open-source text-to-speech (TTS) system that generates natural-sounding speech from text input. It utilizes advanced machine-learning models to produce high-quality audio outputs.

Downloads: 5 This Week

Last Update: 2025-01-06
See Project
20

Qwen2.5-Omni

Capable of understanding text, audio, vision, video

Qwen2.5-Omni is an end-to-end multimodal flagship model in the Qwen series by Alibaba Cloud, designed to process multiple modalities (text, images, audio, video) and generate responses both as text and natural speech in streaming real-time. It supports “Thinker-Talker” architecture, and introduces innovations for aligning modalities over time (for example synchronizing video/audio), robust speech generation, and low-VRAM/quantized versions to make usage more accessible. It holds state...

Downloads: 5 This Week

Last Update: 2025-09-23
See Project
21

Cookbook (Google Gemini)

Examples and guides for using the Gemini API

The Gemini Cookbook is an official repository of examples and guides for using Google’s Gemini API. It provides a structured learning path with quick-start tutorials for beginners and practical examples for advanced users. The repository covers a wide range of Gemini capabilities, including text, images, video, speech, robotics, and multimodal interactions. It highlights newly introduced features such as Gemini 2.5 models (Flash and Pro), Gemini’s native image generation, Veo for video...

Downloads: 10 This Week

Last Update: 3 hours ago
See Project
22

Screenity

The most powerful screen recorder & annotation tool for Chrome

Screenity is a feature-packed screen and camera recorder for Chrome. Annotate your screen to give feedback, emphasize your clicks, edit your recording, and much more. Make unlimited recordings of your tab, desktop, any application, and camera. Annotate by drawing anywhere on the screen, adding text, and creating arrows. Highlight your clicks, focus on your mouse, or hide it from the recording. Individual microphone and computer audio controls, push to talk, and more. Custom countdowns, show...

1 Review

Downloads: 10 This Week

Last Update: 4 days ago
See Project
23

Translate-Subtitle-File

Subtitle Creation Assistant

Subtitle group machine translation assistant - [Function 1: Translate subtitle file] .srt .ass .vtt [Function 2: Voice to text] (Drag in video or audio to recognize subtitles) (The latest version v4.1.0 Update time 2021 2 May 23) 12 translation service providers can be configured, such as Google, Baidu, Tencent, Caiyun, IBM, Azure, Amazon, etc. (6 voice service providers can be configured: Alibaba Cloud, Xunfei, Tencent Cloud, IBM, Azure, Amazon ) Advantages: 1. You can use multiple service...

Downloads: 6 This Week

Last Update: 2025-09-09
See Project
24

AudioCraft

Audiocraft is a library for audio processing and generation

AudioCraft is a PyTorch library for text-to-audio and text-to-music generation, packaging research models and tooling for training and inference. It includes MusicGen for music generation conditioned on text (and optionally melody) and AudioGen for text-conditioned sound effects and environmental audio. Both models operate over discrete audio tokens produced by a neural codec (EnCodec), which acts like a tokenizer for waveforms and enables efficient sequence modeling. The repo provides...

Downloads: 2 This Week

Last Update: 2025-10-13
See Project
25

Amical

Open Source AI Dictation App

Amical is an open source, AI-powered desktop dictation and note-taking application that enables users to dictate hands-free, transcribe meetings, and capture notes effortlessly with unmatched speed, accuracy, and privacy. It leverages both local and cloud-based AI models, letting users seamlessly switch between providers for the ideal balance of speed, precision, and control, and understands the context of each app in use to automatically format text in a tone and style appropriate...

Downloads: 7 This Week

Last Update: 2025-09-20
See Project