Search Results for "batch normalize audio"

Sort By:

Showing 169 open source projects for "batch normalize audio"

View related business solutions

Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.

Start Free
Catch Bugs Before Your Customers Do
Real-time error alerts, performance insights, and anomaly detection across your full stack. Free 30-day trial.

Move from alert to fix before users notice. AppSignal monitors errors, performance bottlenecks, host health, and uptime—all from one dashboard. Instant notifications on deployments, anomaly triggers for memory spikes or error surges, and seamless log management. Works out of the box with Rails, Django, Express, Phoenix, Next.js, and dozens more. Starts at $23/month with no hidden fees.

Try AppSignal Free
1

FFmpeg Batch AV Converter

Free all in one audio/video ffmpeg batch encoder

FFmpeg Batch AV Converter is a free universal audio and video encoder for Windows and Linux (via Wine), that allows to use the full potential of ffmpeg command line with a few mouse clicks in a convenient GUI with drag and drop, progress information. Some fancy wizards make things easy for non-experts. Thanks to its multi-file encoding feature, it may be the fastest a/v batch encoder available, since it maximizes system resources usage by launching as many simultaneous processes up to user cpu thread count. ...

32 Reviews

Downloads: 2,677 This Week

Last Update: 13 hours ago
See Project
2

Ultimate Vocal Remover (UVR5)

GUI for a Vocal Remover that uses Deep Neural Networks

This application uses state-of-the-art source separation models to remove vocals from audio files. UVR's core developers trained all of the models provided in this package (except for the Demucs v3 and v4 4-stem models).

Downloads: 543 This Week

Last Update: 2025-01-20
See Project
3

Trurl

A command line tool for URL parsing and manipulation

trurl is a command-line tool developed by the curl project for parsing and manipulating URLs. It allows users to modify URL components easily, aiding in tasks like scripting and testing.

Downloads: 1 This Week

Last Update: 2025-05-12
See Project
4

HandBrake

A open source video to convert video from any format to modern codecs

HandBrake is an open-source, GPL-licensed, multiplatform, multithreaded video transcoder, available for MacOS X, Linux and Windows.

3 Reviews

Downloads: 309 This Week

Last Update: 2026-03-09
See Project
Try Google Cloud Risk-Free With $300 in Credit
No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.

Start Free
5

abogen

Generate audiobooks from EPUBs, PDFs and text with captions

...In other words, it automates the pipeline of reading a digital book (or document), converting its text into speech via a TTS engine, and packaging the result into an audiobook format — likely along with timestamped captions or subtitles that align with the spoken audio. This can be very useful for accessibility, content consumption on the go, or for users who prefer audio over reading. The repository supports handling common ebook formats and generating outputs that combine audio plus caption metadata. By automating text-to-speech for arbitrary documents, abogen reduces the friction of producing audiobooks and could be integrated into larger workflows (e.g., batch converting a library of texts).

Downloads: 8 This Week

Last Update: 2026-02-06
See Project
6

432Hz Batch Converter

Converts and re-encodes music to 432Hz

...Play a music in 440Hz and in 432Hz, and see which one you prefer. Most people choose the 432Hz version, and it's hard to go back to 440Hz. This application re-encodes your audio files while shifting the pitch to 432Hz. It uses a very high-quality pitch-shifting algorithm. Supports Windows (Win 7 SP1 or later) and Linux (all distros). Installation instructions https://github.com/mysteryx93/HanumanInstituteApps/wiki/432hz-Batch-Converter From Etienne Charland aka Hanuman, by a lightworker in his free time. ...

5 Reviews

Downloads: 70 This Week

Last Update: 2024-06-27
See Project
7

MonsterMusic

A music player on android platform, developed by Andoroid composer

MonsterMusic is a command-line utility to manage and download music from various online platforms.

Downloads: 1 This Week

Last Update: 2026-02-17
See Project
8

StaxRip

Video encoding GUI for Windows

StaxRip is a powerful, open-source video and audio encoding GUI for Windows that orchestrates industry-standard console tools (such as x265, FFmpeg, mkvmerge) and frame-server systems (like AviSynth+ or VapourSynth) to allow users to transcode, mux, remux, or process media files with fine-grained control. It is not a “one-click” encoder; instead, it grants the user deep control over encoding settings, filtering, resizing, cropping, subtitles, audio processing, container formats, and more — making it a tool of choice for videophiles, enthusiasts, and anyone needing high-quality and customized media output. ...

62 Reviews

Downloads: 24 This Week

Last Update: 2026-03-09
See Project
9

OpenAI Go

The official Go library for the OpenAI API

...It enables developers to integrate OpenAI’s models and features into Go applications with a clean and idiomatic interface. The library provides support for a wide range of API endpoints including chat completions, assistants, embeddings, image generation, audio processing, and batch jobs. It includes built-in tools for handling authentication, managing API requests, and parsing structured responses. The repository also offers examples to help developers quickly set up projects and test different API calls. Designed for reliability and ease of use, it is maintained to stay aligned with the evolving OpenAI API specifications.

Downloads: 9 This Week

Last Update: 2 days ago
See Project
Auth0 B2B Essentials: SSO, MFA, and RBAC Built In
Unlimited organizations, 3 enterprise SSO connections, role-based access control, and pro MFA included. Dev and prod tenants out of the box.

Auth0's B2B Essentials plan gives you everything you need to ship secure multi-tenant apps. Unlimited orgs, enterprise SSO, RBAC, audit log streaming, and higher auth and API limits included. Add on M2M tokens, enterprise MFA, or additional SSO connections as you scale.

Sign Up Free
10

edge-tts

Use Microsoft Edge's online text-to-speech service from Python

...The library is asynchronous under the hood, which makes it efficient for batch jobs or web services that need to synthesize many utterances concurrently.

Downloads: 16 This Week

Last Update: 2025-12-12
See Project
11

Voice-Pro

Comprehensive Gradio WebUI for audio processing

Voice-Pro is the best gradio WebUI for transcription, translation and text-to-speech. It can be easily installed with one click. Create a virtual environment using Miniconda, running completely separate from the Windows system (fully portable). Supports real-time transcription and translation, as well as batch mode.

1 Review

Downloads: 14 This Week

Last Update: 2025-12-05
See Project
12

Hugging Face - Speech To Speech

Open speech-to-speech models and pipelines by Hugging Face toolkit AI

...It integrates with the broader Hugging Face ecosystem, making it easier to load pretrained models and run inference. It also serves as a foundation for building real-time or batch audio transformation systems. Overall, it highlights an emerging approach to voice technology that reduces latency and preserves more of the original speech characteristics.

Downloads: 0 This Week

Last Update: 1 day ago
See Project
13

Cookbook (Google Gemini)

Examples and guides for using the Gemini API

The Gemini Cookbook is an official repository of examples and guides for using Google’s Gemini API. It provides a structured learning path with quick-start tutorials for beginners and practical examples for advanced users. The repository covers a wide range of Gemini capabilities, including text, images, video, speech, robotics, and multimodal interactions. It highlights newly introduced features such as Gemini 2.5 models (Flash and Pro), Gemini’s native image generation, Veo for video...

Downloads: 5 This Week

Last Update: 5 hours ago
See Project
14

GenAI Processors

GenAI Processors is a lightweight Python library

GenAI Processors is a lightweight Python library for building modular, asynchronous, and composable AI pipelines around Gemini. Its central abstraction is the Processor, a unit of work that consumes an asynchronous stream of parts (text, images, audio, JSON) and produces another stream, making it natural to chain operations and keep everything streaming end-to-end. Processors can be composed sequentially (to build multi-step flows) or in parallel (to fan-out work and merge results), which makes sophisticated agent behaviors easy to express with simple operators. The library offers built-in processors for classic turn-based Gemini calls as well as Live API streaming, so you can mix “batch” and real-time interactions in the same graph. ...

Downloads: 2 This Week

Last Update: 2026-03-10
See Project
15

VidCoder

A Blu-ray, DVD and video file transcoder for Windows

VidCoder is a Windows-based open-source video transcoding and ripping tool that provides a graphical interface built around standard command-line multimedia tools. It lets users convert video files (or rip DVDs/Blu-rays, when supported) into modern formats and codecs, making it useful for people who want to compress, re-encode, or transcode video content without dealing directly with low-level encoder settings. Because VidCoder integrates and automates the invocation of complex backend...

Downloads: 10 This Week

Last Update: 2026-03-12
See Project
16

comfyui-mixlab-nodes

Workflow and speech recognition app

comfyui-mixlab-nodes is a large collection of custom nodes for ComfyUI that turns workflows into interactive apps and adds real-time multimedia, LLM, and TTS capabilities. It introduces a “Workflow-to-APP” concept, where a ComfyUI graph can be transformed into a Web App through an AppInfo node, complete with categories, batch prompts, and editable configurations. The project also brings Real-time Design features like screen capture and floating video nodes, enabling creative pipelines that mix live screen content, generative models, and visual effects. For audio and speech, it provides nodes for SpeechRecognition and SpeechSynthesis, plus workflows that combine voice generation with real-time face swapping and other audio-visual effects. ...

Downloads: 5 This Week

Last Update: 2025-11-28
See Project
17

pyVideoTrans

Translate the video from one language to another and embed dubbing

pyVideoTrans is an ambitious open-source multimedia processing project that assembles speech recognition, subtitle generation, AI translation, voice synthesis, and video assembly into a unified pipeline for converting videos from one language to another with embedded dubbing and captions. At its core it runs speech-to-text models to transcribe audio tracks, translates the resulting text into a target language using local or cloud-based translation engines, synthesizes new speech to match the translated subtitles, and then merges that speech back into the video, creating a fully localized media file. The tool supports both command-line and GUI modes, making it accessible to developers and creatives needing batch or automated processing.

Downloads: 19 This Week

Last Update: 2026-03-10
See Project
18

ChatTTS_colab

One-click deployment (including offline integration package)

...It provides an integrated offline bundle and scripts for Windows and macOS so users can run ChatTTS locally without wrestling with complex environment setup. The repository includes Colab notebooks that launch a Gradio-based web UI and expose streaming TTS, making it possible to listen to generated audio as it is produced. A distinctive feature is the “voice gacha” system, which batch-generates many distinct voice timbres and allows users to save the ones they like into a curated voice library. It has first-class support for long-form audio generation, making it suitable for audiobooks, podcasts, or long narration tasks. The project also implements multi-speaker or role-based reading, letting users assign different voices to different characters in a script and even use a large language model to generate that script in one step.

Downloads: 0 This Week

Last Update: 2025-11-28
See Project
19

notebooklm-py

Unofficial Python API and agentic skill for Google NotebookLM

notebooklm-py is an unofficial Python API and agent-ready integration layer for Google NotebookLM that exposes NotebookLM functionality through code, the command line, and AI agent workflows. Its goal is to provide programmatic access not just to standard notebook operations, but also to many capabilities that are either limited or unavailable in the web interface, making it especially useful for automation and custom pipelines. The project covers notebook management, source ingestion,...

Downloads: 2 This Week

Last Update: 2 days ago
See Project
20

Scanopy

Clean network diagrams, One-time setup, zero upkeep

Scanopy is a powerful multi-modal data capture and analysis toolkit that enables users to collect, process, and visualize structured and unstructured information from a variety of sources in a flexible pipeline. It is built to handle complex scanning tasks — such as OCR, document analysis, audio transcription, network data capture, and image extraction — while providing unified APIs and workflows that make managing heterogeneous data sources seamless. Developers can compose custom pipelines...

Downloads: 25 This Week

Last Update: 4 days ago
See Project
21

YouTube Playlist Downloader

A tool to download whole playlists, channels or single videos

YoutubePlaylistDownloader is a desktop-based utility designed to simplify the process of downloading entire YouTube playlists with minimal user interaction. The tool allows users to input a playlist URL and automatically retrieve all associated videos, handling the sequence and download process in a structured way. It supports multiple output formats and quality settings, enabling users to choose between audio or video downloads depending on their needs. The application is built with...

Downloads: 1 This Week

Last Update: 1 day ago
See Project
22

WanGP

AI video generator optimized for low VRAM and older GPUs use

Wan2GP is an open source AI video generation toolkit designed to make modern generative models accessible on consumer-grade hardware with limited GPU memory. It acts as a unified interface for running multiple video, image, and audio generation models, including Wan-based models as well as other systems like Hunyuan Video, Flux, and Qwen. A key focus of the project is reducing VRAM requirements, enabling some workflows to run on as little as 6 GB while still supporting older Nvidia and...

Downloads: 0 This Week

Last Update: 1 day ago
See Project
23

Qwen3-ASR

Qwen3-ASR is an open-source series of ASR models

Qwen3-ASR is an automatic speech recognition system in the QwenLM family, developed to convert spoken language into text with strong accuracy and real-time performance. As a specialized ASR variant of the broader Qwen language model ecosystem, it focuses on capturing reliable transcriptions from audio sources such as recordings, live streams, or conversational inputs while supporting low latency use cases. The architecture combines advanced neural acoustic modeling with context-aware...

Downloads: 0 This Week

Last Update: 2026-02-09
See Project
24

gTTS

Python library and CLI tool to interface with Google Translate

...A small CLI utility, gtts-cli, makes it easy to test or batch-generate MP3 files right from the shell.

Downloads: 2 This Week

Last Update: 2025-11-28
See Project
25

Whisper Batch Transcriber

Unlimited, private and free Speech-To-Text program

## About: Automatically transcribe all of your voice recordings into clean, organized, neat text files. It's free, fully automated, unlimited, using state-of-the-art speech-to-text technology. Works 100% offline on your computer, privately and locally. ## Usecases: Convert speeches, podcasts, webinars, monologues, storytellings and other audio speech into a formatted .txt file. One sentence per new line. ## Notes: - Its 2GB in size and requires 2-6GB of GPU VRAM too. (basically...

Downloads: 6 This Week

Last Update: 2025-07-16
See Project