The input pipeline must be prepared by the users. This code is aimed to provide the implementation for Coupled 3D Convolutional Neural Networks for audio-visual matching. Lip-reading can be a specific application for this work. Audio-visual recognition (AVR) has been considered as a solution for speech recognition tasks when the audio is corrupted, as well as a visual recognition method used for speaker verification in multi-speaker scenarios. The approach of AVR systems is to leverage the extracted information from one modality to improve the recognition ability of the other modality by complementing the missing information. The essential problem is to find the correspondence between the audio and visual streams, which is the goal of this work. We proposed the utilization of a coupled 3D Convolutional Neural Network (CNN) architecture that can map both modalities into a representation space to evaluate the correspondence of audio-visual streams using the learned multimodal features.

Features

  • The proposed architecture will incorporate both spatial and temporal information
  • The input pipeline must be provided by the user
  • For lip tracking, the desired video must be fed as the input
  • Running the aforementioned script extracts the lip motions by saving the mouth area of each frame and create the output video with a rectangular around the mouth area
  • In the visual section, the videos are post-processed to have an equal frame rate of 30 f/s
  • The proposed architecture utilizes two non-identical ConvNets which uses a pair of speech and video streams

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow Lip Reading

Lip Reading Web Site

Other Useful Business Software
99.99% Uptime for MySQL and PostgreSQL on Google Cloud Icon
99.99% Uptime for MySQL and PostgreSQL on Google Cloud

Enterprise Plus edition delivers sub-second maintenance downtime and 2x read/write performance. Built for critical apps.

Cloud SQL Enterprise Plus gives you a 99.99% availability SLA with near-zero downtime maintenance—typically under 10 seconds. Get 2x better read/write performance, intelligent data caching, and 35 days of point-in-time recovery. Supports MySQL, PostgreSQL, and SQL Server with built-in vector search for gen AI apps. New customers get $300 in free credit.
Try Cloud SQL Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of Lip Reading!

Additional Project Details

Programming Language

Python

Related Categories

Python Machine Learning Software, Python Speech Recognition Software

Registered

2022-08-11