Font Size: a A A

Design And Implementation Of Lip Reading System Based On Deep Learning

Posted on:2022-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:H Z ShiFull Text:PDF
GTID:2518306341452114Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Lip reading,also known as visual speech recognition,is a technology to recognize the speaker's speech information,only relying on the visual information of the movements of the lips.Different from speech recognition,lip reading is not affected by the signal-to-noise ratio and can overcome the shortcomings of speech recognition in complex scenes,thus greatly expanding the possible scenarios of human-computer interaction.Lip reading,which combines the fields of computer vision and natural language processing,is a challenging field.With the rapid development of artificial intelligence technology,data-driven deep learning provides a brand new development direction for lip-reading technology.While a deep learning-based lip-reading system can achieve recognition accuracy far beyond that of humans,it still has a lot of room for optimization.At the same time,highly accurate models usually have a large number of parameters,which greatly limits the practical application scenarios of lip-reading.This paper takes the core technology of lip reading as a breakthrough and takes dataset implementation and system development as the main work.This paper focuses on reducing the large-scale complex model,improving the performance of the model and reducing the number of parameters.Design and implement a new practical lip-reading system based on deep learning.The main work and innovation of this paper are as follows:(?)A highly scalable lip motion feature extraction module is proposed.This paper improve the performance of lip motion feature extraction module from two aspects.The structure of feature extraction in time domain is modified and the channel attention module is introduced to improve the accuracy of feature extraction.Both improvements can be introduced into most mainstream feature extraction networks and have good generality and expansibility.(?)A lightweight lip motion feature extraction module that can be deployed locally in the mobile terminal in the future is implemented.Through the use of lightweight convolutional network combined with the above modules,the lightweight lip motion feature extraction module is realized,which only sacrifices small performance to greatly reduce the model scale and greatly expands the practical application scenarios of the lip reading system.(?)A state-of-art lip reading process based on the fusion modeling of visual features and reconstructed audio features is proposed and implemented.In addition to solving the problem of audio information waste in the training set,the audio and video data in the existing data set are fully utilized for training,and the accuracy of the lip-reading system is greatly improved through the fusion modeling of the characteristics of different state Spaces.
Keywords/Search Tags:deep learning, lip reading, spatial-temporal feature extraction, attention mechanism
PDF Full Text Request
Related items