whisper-timestamped download

Multilingual Automatic Speech Recognition with word-level timestamps and confidence. Whisper is a set of multi-lingual, robust speech recognition models trained by OpenAI that achieve state-of-the-art results in many languages. Whisper models were trained to predict approximate timestamps on speech segments (most of the time with 1-second accuracy), but they cannot originally predict word timestamps. This repository proposes an implementation to predict word timestamps and provide a more accurate estimation of speech segments when transcribing with Whisper models. Besides, a confidence score is assigned to each word and each segment.

Features

The start/end estimation is more accurate
Documentation available
Confidence scores are assigned to each word
If possible (without beam search...), no additional inference steps are required to predict word timestamps (word alignment is done on the fly after each speech segment is decoded)
Special care has been taken regarding memory usage
Light installation for CPU
Plot of word alignment

Project Samples

Project Activity

See All Activity >

License

Affero GNU Public License

Follow whisper-timestamped

whisper-timestamped Web Site

Other Useful Business Software

Go From AI Idea to AI App Fast

One platform to build, fine-tune, and deploy ML models. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.

Try Free

Rate This Project

User Reviews

Be the first to post a review of whisper-timestamped!

Additional Project Details

Operating Systems

Linux, Mac, Windows

Programming Language

Python

Related Categories

Python Machine Learning Software, Python LLM Inference Tool

Registered

2024-08-14

Similar Business Software

Google Cloud Speech-to-Text

Google Cloud’s Speech API processes more than 1 billion voice minutes per month with close to human levels of understanding for many commonly spoken languages. Powered by the best of Google's AI research and technology, Google Cloud's Speech-to-Text API helps you accurately transcribe speech...

See Software
Google Cloud BigQuery

BigQuery is a serverless, multicloud data warehouse that simplifies the process of working with all types of data so you can focus on getting valuable business insights quickly. At the core of Google’s data cloud, BigQuery allows you to simplify data integration, cost effectively and securely...

See Software
Vertex AI

Build, deploy, and scale machine learning (ML) models faster, with fully managed ML tools for any use case. Through Vertex AI Workbench, Vertex AI is natively integrated with BigQuery, Dataproc, and Spark. You can use BigQuery ML to create and execute machine learning models in BigQuery...

See Software
LM-Kit.NET

LM-Kit.NET is a cutting-edge, high-level inference SDK designed specifically to bring the advanced capabilities of Large Language Models (LLM) into the C# ecosystem. Tailored for developers working within .NET, LM-Kit.NET provides a comprehensive suite of powerful Generative AI tools, making...

See Software
Fraud.net

Fraudnet's AI-driven platform empowers enterprises to prevent threats, streamline compliance, and manage risk in real-time. Our sophisticated machine learning models continuously learn from billions of transactions to identify anomalies and predict fraud attacks. Our unified solutions:...

See Software
Qloo

Qloo is the “Cultural AI”, decoding and predicting consumer taste across the globe. A privacy-first API that predicts global consumer preferences and catalogs hundreds of millions of cultural entities. Through our API, we provide contextualized personalization and insights based on a deep...

See Software

Report inappropriate content

whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps

Get an email when there's a new version of whisper-timestamped

Features

Project Samples

Project Activity

Categories

License

Follow whisper-timestamped

User Reviews

Additional Project Details

Operating Systems

Programming Language

Related Categories

Registered