Search Results for "audio and video stream" - Page 4

Sort By:

Showing 1742 open source projects for "audio and video stream"

View related business solutions

Windows Clear Filters & Widen Search

Fully Managed MySQL, PostgreSQL, and SQL Server
Automatic backups, patching, replication, and failover. Focus on your app, not your database.

Cloud SQL handles your database ops end to end, so you can focus on your app.

Try Free
Custom VMs From 1 to 96 vCPUs With 99.95% Uptime
General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.

Try Free
1

FFmpeg Batch AV Converter

Free all in one audio/video ffmpeg batch encoder

FFmpeg Batch AV Converter is a free universal audio and video encoder for Windows and Linux (via Wine), that allows to use the full potential of ffmpeg command line with a few mouse clicks in a convenient GUI with drag and drop, progress information. Some fancy wizards make things easy for non-experts. Thanks to its multi-file encoding feature, it may be the fastest a/v batch encoder available, since it maximizes system resources usage by launching as many simultaneous processes up to user cpu thread count. ...

32 Reviews

Downloads: 2,962 This Week

Last Update: 7 days ago
See Project
2

Navidrome

Your Personal Streaming Service

Navidrome is an open-source, web-based personal music server that lets you stream and manage your entire music collection from any browser or compatible mobile app, effectively turning your own files into a cloud-accessible music service. It supports large libraries and handles a wide variety of audio formats while maintaining very low resource usage, so it runs well even on small servers, Raspberry Pi devices, and other constrained hardware.

Downloads: 8 This Week

Last Update: 3 days ago
See Project
3

Plyr

Simple HTML5, YouTube and Vimeo player

A simple, accessible and customizable media player for HTML5 Video, HTML5 Audio, YouTube and Vimeo. Premium video monetization from Video Intelligence. Plyr is a simple, lightweight, accessible and customizable HTML5, YouTube and Vimeo media player that supports modern browsers. Accessible - full support for VTT captions and screen readers. Customizable - make the player look how you want with the markup you want.

Downloads: 2 This Week

Last Update: 2026-01-03
See Project
4

idonthavespotify

Effortlessly convert Spotify links to your preferred streaming service

Copy a link from your favorite streaming service, paste it into the search bar, and voilà! Links to the track on all other supported platforms are displayed. If the original source is Spotify you'll even get a quick audio preview to ensure it's the right track.

Downloads: 0 This Week

Last Update: 2025-12-23
See Project
8 Monitoring Tools in One APM. Install in 5 Minutes.
Errors, performance, logs, uptime, hosts, anomalies, dashboards, and check-ins. One interface.

AppSignal works out of the box for Ruby, Elixir, Node.js, Python, and more. 30-day free trial, no credit card required.

Start Free
5

Frigate NVR

NVR with realtime local object detection for IP cameras

Frigate is a local network video recorder designed for real-time object detection on IP camera streams using machine learning. It runs entirely on local hardware and integrates closely with Home Assistant to provide smart surveillance without relying on cloud processing. The system uses OpenCV and TensorFlow to analyze video feeds and detect objects such as people, vehicles, and animals in real time. Frigate is optimized for efficiency and supports hardware acceleration across a wide range...

Downloads: 3 This Week

Last Update: 2026-03-19
See Project
6

Nextcloud Talk

Video- & audio-conferencing app for Nextcloud

Nextcloud Talk is the official chat, video and audio conferencing app for Nextcloud that allows users to chat, call and screenshare with multiple other users. Nextcloud offers better protection for your communication as it provides end-to-end encryption and keeps even metadata from leaking. You can have private, group, public or password protected calls by simply inviting one person, a whole group, or sending a public link as an invitation to a call.

1 Review

Downloads: 7 This Week

Last Update: 1 day ago
See Project
7

Peer Calls

Group peer to peer video calls for everyone written in Go

Peer Calls is a self-hosted, open-source WebRTC-based video and audio calling platform for group communication. Designed for simplicity and privacy, it allows anyone to run their own video conferencing service without relying on third-party providers. Peer Calls supports multi-user rooms, screen sharing, and chat, all delivered via a clean web interface. It’s great for small teams, communities, and educational groups seeking secure and customizable alternatives to mainstream conferencing tools.

Downloads: 0 This Week

Last Update: 2025-04-08
See Project
8

Amazon Chime SDK React Components

Chime React Component Library with integrations with the Amazon SDK

The Amazon Chime SDK makes it easy to add collaborative audio calling, video calling, and screen share features to web applications by using the same infrastructure services that power millions of Amazon Chime online meetings. The Amazon Chime SDK React Component Library supplies client-side state management and reusable UI components for common web interfaces used in audio and video conferencing applications, including: video tile grids, microphone activity indicators, and call controls. ...

Downloads: 0 This Week

Last Update: 2025-11-20
See Project
9

LiveAvatar

Streaming Real-time Audio-Driven Avatar Generation

LiveAvatar is an open-source research and implementation project that provides a unified framework for real-time, streaming, interactive avatar video generation driven by audio and other control signals. It implements techniques from state-of-the-art diffusion-based avatar modeling to support infinite-length continuous video generation with low latency, enabling interactive AI avatars that maintain continuity and realism over extended sessions. The project co-designs algorithms and system optimizations, such as block-wise autoregressive processing and fast sampling strategies, to deliver real-time frame rates (e.g., ~45 FPS on appropriate GPU clusters) while handling non-stop generation without quality degradation. ...

Downloads: 0 This Week

Last Update: 2026-01-30
See Project
Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.

Start Free
10

RtspSimpleServer

ready-to-use RTSP / RTMP / LL-HLS / WebRTC server and proxy

rtsp-simple-server is a ready-to-use and zero-dependency server and proxy that allows users to publish, read and proxy live video and audio streams. Publish live streams to the server Read live streams from the server. Proxy streams from other servers or cameras, always or on-demand. Streams are automatically converted from a protocol to another. For instance, it's possible to publish a stream with RTSP and read it with HLS. Serve multiple streams at once in separate paths Authenticate users; use internal or external authentication. ...

Downloads: 43 This Week

Last Update: 3 days ago
See Project
11

yt-dlp

A youtube-dl fork with additional features and fixes

yt-dlp is a youtube-dl fork based on the now inactive youtube-dlc. The main focus of this project is adding new features and patches while also keeping up to date with the original project

Downloads: 512 This Week

Last Update: 2026-03-17
See Project
12

Story Flicks

Generate high-definition story short videos with one click using AI

...Because the project is open and modifiable, developers can customize the generation pipeline: adjust story structure, alter rendering parameters, tweak video quality or resolution, or integrate with other AI models (e.g. for audio, voice-over, or image-to-video). It’s especially useful as a starting template or experimentation ground for developers building automated content-creation tools.

Downloads: 7 This Week

Last Update: 2025-12-14
See Project
13

SALMONN family

A suite of advanced multi-modal LLMs

SALMONN is a family of advanced multi-modal large language models (LLMs) developed by ByteDance — designed to handle and integrate multiple data modalities (e.g. text, audio, video) rather than just plain text. The repository bundles different branches targeting specialized tasks (e.g. video-SALMONN, speech-quality assessment, general multimodal tasks), suggesting that the project is modular and extensible across domains. SALMONN aims to push the frontier of multi-modal AI by allowing models to process and reason over diverse inputs, which can be useful for applications such as video understanding, speech analytics, cross-modal retrieval, and general AI capable of interpreting rich, multi-sensory data. ...

Downloads: 0 This Week

Last Update: 2025-12-02
See Project
14

YoutubeExplode

Abstraction layer over YouTube's internal API

...The project exposes a clean API that allows applications to query videos, playlists, channels, and search results without relying on the official YouTube Data API. Under the hood, the library parses raw page data and leverages reverse-engineered internal endpoints to obtain structured information and stream manifests. Developers can use it to access details such as titles, authors, durations, captions, and available media formats, as well as to download audio or video streams for further processing. The library is designed to be intuitive and cross-platform through .NET Standard compatibility, making it suitable for desktop tools, automation pipelines, and media utilities.

Downloads: 2 This Week

Last Update: 2026-02-20
See Project
15

Transcoder

Hardware-accelerated video transcoding using Android MediaCodec APIs

Transcoder by DeepMedia is an AI-powered video-to-video speech translation engine that enables fully automated multilingual dubbing. Unlike traditional speech translation systems that rely on multi-stage pipelines, Transcoder directly translates one speaker’s video into another language while preserving facial expressions, lip-sync, and vocal identity. Designed for real-time use and production-grade pipelines, Transcoder combines advanced deep learning models with GPU acceleration to deliver...

Downloads: 1 This Week

Last Update: 2025-03-25
See Project
16

Moshi

A speech-text foundation model for real time dialogue

...At inference, the stream from the user is taken from the audio input, and the one for Moshi is sampled from the model's output. Along these two audio streams, Moshi predicts text tokens corresponding to its own speech, its inner monologue, which greatly improves the quality of its generation. A small Depth Transformer models inter codebook dependencies for a given time step, while a large, 7B parameter Temporal Transformer models the temporal dependencies.

Downloads: 1 This Week

Last Update: 2024-11-05
See Project
17

AI-Media2Doc

AI tool converting video/audio into structured documents instantly

AI-Media2Doc is a web-based application that uses large language models to convert video and audio content into structured, readable documents in a single workflow. It is designed to transform multimedia inputs into formats such as knowledge notes, summaries, mind maps, and social-style articles, making content easier to review and reuse. AI-Media2Doc emphasizes privacy by processing media locally in the browser using WebAssembly-based ffmpeg, ensuring that original video files are not uploaded externally. ...

Downloads: 1 This Week

Last Update: 2026-03-18
See Project
18

Vidi2

Large Multimodal Models for Video Understanding and Editing

Vidi is a family of large multimodal models developed for deep video understanding and editing tasks, integrating vision, audio, and language to allow sophisticated querying and manipulation of video content. It’s designed to process long-form, real-world videos and answer complex queries such as “when in this clip does X happen?” or “where in the frame is object Y during that moment?” — offering temporal retrieval, spatio-temporal grounding (i.e. locating objects over time + space), and even video question answering. ...

Downloads: 0 This Week

Last Update: 2026-03-04
See Project
19

MediaDevices

Go implementation of the MediaDevices API

mediadevices is a Go library developed by the Pion WebRTC team that enables real-time access to audio and video devices for building native Go applications involving media streaming and conferencing. It provides a cross-platform, unified API for capturing and manipulating media streams and is often used in combination with Pion WebRTC for peer-to-peer communications. Its support for device enumeration, media constraints, and frame processing makes it a powerful building block for custom voice and video solutions in Go.

Downloads: 0 This Week

Last Update: 2026-02-05
See Project
20

WhisperJAV

Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD

WhisperJAV is an open-source speech transcription pipeline designed specifically for generating subtitles for Japanese adult video content. The project addresses challenges that standard speech recognition models face when transcribing this type of audio, which often includes low signal-to-noise ratios and large numbers of non-verbal vocalizations. Traditional automatic speech recognition systems can misinterpret these sounds as words, leading to inaccurate transcripts. ...

Downloads: 7 This Week

Last Update: 5 days ago
See Project
21

audioFlux

A library for audio and music analysis, feature extraction

A library for audio and music analysis, and feature extraction. Can be used for deep learning, pattern recognition, signal processing, bioinformatics, statistics, finance, etc. audioflux is a deep learning tool library for audio and music analysis, feature extraction. It supports dozens of time-frequency analysis transformation methods and hundreds of corresponding time-domain and frequency-domain feature combinations. It can be provided to deep learning networks for training and is used to...

Downloads: 0 This Week

Last Update: 2024-08-09
See Project
22

Amazon Chime SDK for JavaScript

A JavaScript client library for integrating multi-party communications

The Amazon Chime SDK is a set of real-time communications components that developers can use to quickly add messaging, audio, video, and screen sharing capabilities to their web or mobile applications. Developers can build on AWS's global communications infrastructure to deliver engaging experiences in their applications. For example, they can add video to a health application so patients can consult remotely with doctors on health issues, or create customized audio prompts for integration with the public telephone network. ...

Downloads: 0 This Week

Last Update: 2026-01-02
See Project
23

pyVideoTrans

Translate the video from one language to another and embed dubbing

pyVideoTrans is an ambitious open-source multimedia processing project that assembles speech recognition, subtitle generation, AI translation, voice synthesis, and video assembly into a unified pipeline for converting videos from one language to another with embedded dubbing and captions. At its core it runs speech-to-text models to transcribe audio tracks, translates the resulting text into a target language using local or cloud-based translation engines, synthesizes new speech to match the translated subtitles, and then merges that speech back into the video, creating a fully localized media file. ...

Downloads: 17 This Week

Last Update: 2026-03-27
See Project
24

VidCoder

A Blu-ray, DVD and video file transcoder for Windows

VidCoder is a Windows-based open-source video transcoding and ripping tool that provides a graphical interface built around standard command-line multimedia tools. It lets users convert video files (or rip DVDs/Blu-rays, when supported) into modern formats and codecs, making it useful for people who want to compress, re-encode, or transcode video content without dealing directly with low-level encoder settings. Because VidCoder integrates and automates the invocation of complex backend...

Downloads: 3 This Week

Last Update: 2026-03-12
See Project
25

BizHawk

BizHawk is a multi-system emulator written in C#

...As well as quality-of-life features for casual players, it also has recording/playback and debugging tools, making it the first choice for TASers (Tool-Assisted Speedrunners). Screenshotting and recording audio + video to file. Firmware management, input, framerate, and more in a HUD over the game. Rebindable hotkeys for controlling the frontend (keyboard+mouse+gamepad). A comprehensive input mapper for the emulated gamepads and other peripherals. Programmatic control over core and frontend with Lua or C#.NET. Development builds are made automatically whenever someone contributes. ...

Downloads: 34 This Week

Last Update: 2025-09-20
See Project