Multilingual Automatic Speech Recognition with word-level timestamps and confidence. Whisper is a set of multi-lingual, robust speech recognition models trained by OpenAI that achieve state-of-the-art results in many languages. Whisper models were trained to predict approximate timestamps on speech segments (most of the time with 1-second accuracy), but they cannot originally predict word timestamps. This repository proposes an implementation to predict word timestamps and provide a more accurate estimation of speech segments when transcribing with Whisper models. Besides, a confidence score is assigned to each word and each segment.
Features
- The start/end estimation is more accurate
- Documentation available
- Confidence scores are assigned to each word
- If possible (without beam search...), no additional inference steps are required to predict word timestamps (word alignment is done on the fly after each speech segment is decoded)
- Special care has been taken regarding memory usage
- Light installation for CPU
- Plot of word alignment
License
Affero GNU Public LicenseFollow whisper-timestamped
Other Useful Business Software
Go From AI Idea to AI App Fast
Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of whisper-timestamped!