Denoiser is a real-time speech enhancement model operating directly on raw waveforms, designed to clean noisy audio while running efficiently on CPU. It uses a causal encoder-decoder architecture with skip connections, optimized with losses defined both in the time domain and frequency domain to better suppress noise while preserving speech. Unlike models that operate on spectrograms alone, this design enables lower latency and coherent waveform output. The implementation includes data augmentation techniques applied to the raw waveforms (e.g. noise mixing, reverberation) to improve model robustness and generalization to diverse noise types. The project supports both offline denoising (batch inference) and live audio processing (e.g. via loopback audio interfaces), making it practical for real-time use in calls or recording. The codebase includes training and evaluation scripts, configuration management via Hydra, and pretrained models on standard noise datasets.
Features
- Causal waveform-domain speech enhancement (no spectral inversion)
- Encoder-decoder architecture with skip connections for high fidelity
- Combined time-domain and frequency-domain loss optimization
- Raw waveform data augmentation to boost robustness against noise/reverb
- Support for live audio processing with low latency
- Training/evaluation scripts with pretrained models and config pipeline