Font Size: a A A

Research On Lip Reading Method Based On Deep Learning

Posted on:2020-07-29Degree:MasterType:Thesis
Country:ChinaCandidate:J YanFull Text:PDF
GTID:2428330575478097Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Automatic lip reading refers to the analysis of the content of lip by capturing the movements of the person while speaking.It has broad development prospects in the fields of human computer interaction,speech recognition and video surveillance.Traditionally,automatic lip reading system generally consists of three steps:lip detection and localization,feature extraction and recognition.However,image preprocessing processes are complex and the handcrafted features have characteristic of time-consuming,empirical and incomplete.Furthermore,the training of classifier is difficult.The traditional lip reading method develops slowly and has difficulty in meeting the practical application requirements.Recently,deep learning has attracted increasing attention of researchers and has made breakthrough in many fields such as image recognition,human action recognition,speech recognition and natural language processing.Avoiding the traditional artificial feature selection and high-performance classifier design,deep learning can learn more features directly from the original data and can truly realize the end-to-end recognition system.This paper focuses on the research of deep learning method in lip reading recognition and proposes a hybrid neural network architecture which integrates convolution neural network(CNN)and recurrent neural network(RNN).The study of the entire lip reading recognition can be divided into the following four parts:Firstly,we preprocess the database.It can be divided into two parts:First,a fixed number of frames are extracted from the video by random sampling algorithm.Second,the localization and extraction of lip region is carried out.The AdaBoost algorithm is used to detect the face region.The Dlib library is adopted to further calibrate 68 key points of face.By extracting the five key points which describe lip region,we accurately acquire the obj ect of the study,the area of lip.Then,spatial features are directly extracted from static lip images using CNN.By pre-training the Alexnet network model,the local features are integrated into global features using an 8-layer network structure.The spatial feature vectors of the fc7 layer are extracted to describe lip image.The extracted features have the characteristics of stronger robustness and fault-tolerant capability.Since in the video,the lip movement information exits not only within the frames but also in inter-frames.Therefore,based on CNN,the RNN network structure is added to extract temporal features between lip sequences.In order to solve the problem of gradient disappearance and gradient explosion of traditional RNN in long sequence,the paper analyses RNN's improved model,long short-term memory structure(LSTM)to capture temporal features between lip sequences.Furthermore,the paper studies the performance of bidirectional LSTM(BiLSTM)network to capture the correlation of sequential information among frame features in two opposite directions of the lip movement sequence.In addition,the dropout strategy is added to alleviate the overfitting problem during training.Finally,the temporal features which learned by BiLSTM are put into fully connected layer.The softmax classifier is used to output the probability value of each category and the maximum probability value is selected as the final recognition result.This paper combines the ability of CNN to process static image and RNN to deal with sequence data to capture lip motion information in spatial and temporal dimensions simultaneously.The self-made experimental database is used for verification.The experimental results show that the lip reading recognition of the hybrid neural network model proposed in this paper has better performance.
Keywords/Search Tags:automatic lip reading, deep learning, convolution neural network, recurrent neural network
PDF Full Text Request
Related items