Open Speech Corpora is a curated catalog of speech datasets intended to support research and development in automatic speech recognition, text-to-speech, and other speech technologies. The repository is organized as a set of tables that list corpora along with their languages, total hours, number of speakers, download links, and licenses, giving practitioners a quick way to find data that matches their needs. It emphasizes free and truly “open” datasets, favoring those released under Creative Commons or community-friendly data licenses, though it also lists corpora that are accessible for research and many commercial uses. The catalog covers well-known resources such as Mozilla Common Voice, Yesno, LJ Speech and numerous Nordic and parliamentary speech corpora, along with their license variants like CC-0 and CC-BY. It is actively maintained as a community resource: users are encouraged to propose new corpora via issues, and there is a backlog of datasets waiting to be integrated.

Features

  • Centralized catalog of speech corpora for ASR, TTS and related tasks
  • Detailed metadata including language, duration, speakers, download links and licenses
  • Emphasis on free and open datasets suitable for research and many commercial uses
  • Coverage of popular corpora like Common Voice, LJ Speech and multiple Nordic resources
  • Community-driven updates via issues and pull requests to keep the list evolving
  • License-based grouping (CC-0, CC-BY and more) to simplify compliance and dataset selection

Project Samples

Project Activity

See All Activity >

Categories

Text to Speech

License

MIT License

Follow Open Speech Corpora

Open Speech Corpora Web Site

Other Useful Business Software
Forever Free Full-Stack Observability | Grafana Cloud Icon
Forever Free Full-Stack Observability | Grafana Cloud

Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
Create free account
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of Open Speech Corpora!

Additional Project Details

Registered

2025-11-28