Font Size: a A A

Research And Implementation Of Lipreading Recognition Based On Deep Learning

Posted on:2019-03-01Degree:MasterType:Thesis
Country:ChinaCandidate:F YangFull Text:PDF
GTID:2348330569495553Subject:Engineering
Abstract/Summary:PDF Full Text Request
Lipreading recognition is to observe the movement of human lips from the video to identify the corresponding text content,which is a challenging research topic in the field of Computer Vision.However,the limitations of lip change and the richness of language content increase the difficulty of lipreading recognition,making the study of lipreading development slow.Deep Learning(DL)has made great progress and brings new hope to Artificial Intelligence since 2006.The development of Deep Learning in various fields gives us enough confidence to complete the task of lipreading recognition up to now.Different from lipreading recognition in traditional lipreading recognition,lipreading based on deep learning usually involves the use of the Deep Neural Network for feature extraction and understanding of images.In this topic,we focus on the data acquisition and processing of lipreading recognition and the study of the network structure of lipreading recognition.In the study of lipreading recognition,this thesis is the first study of Chinese sentence-level lipreading recognition in Deep Learning.Semi-automatic generation method is used to generate a Chinese lipreading database CCTVDS.And the number of samples is 14975,a total of 7.25 GB.In addition,269,558 pinyin Chinese character sample data sets have been newly added to facilitate the training of network models,during the research process.In the process of lipreading recognition,we start with two aspects of image processing and language model.One is to use a deep neural network combining CNN based on VGG-M model and RNN,and the other is to adopt a deep neural network of the Encoder-Decoder Network based on language model.Based on the research content,this paper divides the Chinese lipreading recognition into two different processing processes and obtains different sub-network structures respectively.Firstly,the improved VGG-M Convolutional Neural Network,called ConvNet,is used to extract the lip image sequence,and then the Long Short-Term Memory Network(LSTM)is used to understand the image features and convert them into the corresponding pinyin character sequence,which designed a P2P(Pictures to Pin Ying)network model.The Pinyin sequence statement is converted into a Chinese character sequence statement using the Encoder-Decoder network based on the language model then.The Encodernetwork first encodes the feature of the pinyin character sequence,and the Decoder network decodes the feature to obtain the Chinese character sequence.This process constructs a P2CC(Pin Ying to Chinese Characters)network model.Finally,a hybrid neural network structure CNLipNet based on CNN and RNN is proposed according to the research content.The experimental results on the CCTVDS dataset show that the lipreading recognition using Deep Learning has obvious advantages over traditional lipreading recognition(using PCA,HMM,etc.).In addition,our proposed ChLipNet network model reduces the difficulty of recognizing Chinese lipreading.In sentence-level Chinese lipreading recognition,our accuracy rate of the sentence is 46.7% and the accuracy rate is 58.5%,which is slightly better than the experimental results of the best network model for the current lipreading recognition.
Keywords/Search Tags:deep learning, convolutional neural networks, recurrent neural networks, lipreading recognition, Long Short-Term Memory
PDF Full Text Request
Related items