Font Size: a A A

The Research On Lip Reading Technology Based On Deep Learning

Posted on:2020-09-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y D WeiFull Text:PDF
GTID:2428330623455830Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
As an important ability of human beings,hearing is an important medium to communicate with other people.However,there are still many people in the world who can't communicate with others because of their defects.Lip reading as a special skill can help them to communicate with others.Besides,lip reading can be applied in many fields such as in noisy environment speech recognition,security system certification and public safety analysis.Lip reading has a great potential in research.Nowadays,with the development of deep learning,lip reading can be achieved.This paper focuses on lip recognition based on deep learning.Deep learning is a kind of algorithm driven by data.More and more successful deep learning examples show that the quality of the data set determines the results of the deep learning model.Based on the lip reading data set,the paper proposes an automatic labeling system based on Pyramid KL optical flow method to solve the problem in building lip reading data set.The system uses the speech processing technology and the face and lip region localization technology to preprocess the video.And the optical flow method is used to calculate the motion information of the lips between adjacent frames.Based on the motion information,the beginning and end time of lip movement is determined in video.Compared with the method of speech recognition,the samples labeled by this system are more accurate.The lip reading data set created by this method is of higher quality.In order to realize Chinese lip language recognition,this paper used this system to establish a Chinese common language lip language data set CPLDS.In the construction of deep learning model of lip reading,the paper introduces the deep neural network model based on VGG neural network and GRU neural network to identify the information of the space and time information in the lip area.In this paper,the lip-motion feature extraction is divided into two different processes.The improved VGG convolutional neural network is used to extract the features of the lip image sequence.The GRU neural network is used to extract the temporal features of the lip motion.The CTC function is used as the loss function.In the model training process,over-fitting is prevented by using Batch Normalization(BN)and Dropout(Dropout).Besides,migration learning is used to improve the generalization ability of the model.On the CPLDS data set and MIRACL-VC1 data set with 20 corpus sizes,the proposed deep learning lip recognition model achieves 97.3% and 96.6% recognition rates.In the small corpus scenario,it is slightly better than the experimental results of the current lip recognition network model.
Keywords/Search Tags:Deep Learning, Lip Reading, VGG, GRU, Pyramid LK Optical Flow
PDF Full Text Request
Related items