Design And Implementation Of Lip Reading System Based On Deep Learning

Posted on:2022-03-04

Degree:Master

Type:Thesis

Country:China

Candidate:H Z Shi

Full Text:PDF

GTID:2518306341452114

Subject:Electronics and Communications Engineering

Abstract/Summary:

PDF Full Text Request

Lip reading,also known as visual speech recognition,is a technology to recognize the speaker's speech information,only relying on the visual information of the movements of the lips.Different from speech recognition,lip reading is not affected by the signal-to-noise ratio and can overcome the shortcomings of speech recognition in complex scenes,thus greatly expanding the possible scenarios of human-computer interaction.Lip reading,which combines the fields of computer vision and natural language processing,is a challenging field.With the rapid development of artificial intelligence technology,data-driven deep learning provides a brand new development direction for lip-reading technology.While a deep learning-based lip-reading system can achieve recognition accuracy far beyond that of humans,it still has a lot of room for optimization.At the same time,highly accurate models usually have a large number of parameters,which greatly limits the practical application scenarios of lip-reading.This paper takes the core technology of lip reading as a breakthrough and takes dataset implementation and system development as the main work.This paper focuses on reducing the large-scale complex model,improving the performance of the model and reducing the number of parameters.Design and implement a new practical lip-reading system based on deep learning.The main work and innovation of this paper are as follows:(?)A highly scalable lip motion feature extraction module is proposed.This paper improve the performance of lip motion feature extraction module from two aspects.The structure of feature extraction in time domain is modified and the channel attention module is introduced to improve the accuracy of feature extraction.Both improvements can be introduced into most mainstream feature extraction networks and have good generality and expansibility.(?)A lightweight lip motion feature extraction module that can be deployed locally in the mobile terminal in the future is implemented.Through the use of lightweight convolutional network combined with the above modules,the lightweight lip motion feature extraction module is realized,which only sacrifices small performance to greatly reduce the model scale and greatly expands the practical application scenarios of the lip reading system.(?)A state-of-art lip reading process based on the fusion modeling of visual features and reconstructed audio features is proposed and implemented.In addition to solving the problem of audio information waste in the training set,the audio and video data in the existing data set are fully utilized for training,and the accuracy of the lip-reading system is greatly improved through the fusion modeling of the characteristics of different state Spaces.

Keywords/Search Tags:

deep learning, lip reading, spatial-temporal feature extraction, attention mechanism

PDF Full Text Request

Related items

1	Research On Video Person Re-identification Method Based On Spatial-temporal Attention Mechanism
2	Person Re-identification Based On Convolutional Neural Network
3	Study On Human Action Recognition Based On Non-local Spatial-temporal Residual Attention Mechanism
4	The Design And Realization Of Spatial-Temporal Feature Extraction And Recognition Algorithm For Human Action Analysis
5	Database Construction And Algorithm Research Of Visual Speech Recognition Based On Deep Learning
6	Research On Feature Fusion Strategies Of Attention Mechanism In Image Description
7	Video Saliency Prediction Based On Spatial-temporal Features
8	Reading Comprehension Model Based On Two-way Attention Mechanism And Conditional Random Field
9	The Research On Robust Spatial-temporal Co-occurrence Feature Extraction Algorithm For Facial Action Unit Detection
10	Research On Methods Of Images Deep Feature Extraction