Lip-reading Recognition Based On Spatio-Temporal Convolution And Bidirectional GRU

Posted on:2020-09-13

Degree:Master

Type:Thesis

Country:China

Candidate:Y F Shen

Full Text:PDF

GTID:2428330590477043

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Lipreading is a technique of recognizing speech content only according to the visual information of the speaker's lip movement,which has been widely used in lip interactive controlling,mute message inputting,speech recognition in noisy environment,and silent video recognition.It also have a great significance in research about auxiliary authentication and public security area,as well as helping the deaf communication.However,lipreading is a very difficult task for human beings.Traditional machine learning methods or models are timeconsuming in extracting the lip movement features and have poor recognition effect.What's more,there are very few Chinese lipreading datasets available.The data is limited,so is the application value.In order to solve these problems in Chinese lipreading,the main idea of this thesis is to build a large number of Chinese lipreading datasets,use the multi-STCNNs and multi-Bi-GRUs to extract lip features,and train end-to-end,so as to implement sentence-level Chinese lipreading.The main research contents and contributions of this thesis are as follows:(1)First,this thesis implements a client named ‘Lipreading Video' based on ios system to construct the Chinese lipreading datasets.This client allows different users to record liplanguage video data.Users can record the corresponding lip-language video according to the text displayed on the client,and choose to review,re-record,or upload video to the server.The system uses Voice activity detection algorithm(VAD)to detect and segment the collected liplanguage video data,and automatically marks the start and end timestamps of each word spoken by the speakers.Then it uses the AdaBoost cascade classifier based on haar-like features to detect the face,locate the face and extract the lip region.This scheme can mark lip-language video data in batch,and saves a lot of labor.(2)This thesis proposes the ‘ChineseLipNet' model,an end-to-end model based on multiSTCNNs and multi-Bi-GRUs.For the input lip-language video data,multi-STCNNs will first extract the features and the Max-Pooling layers can reduce the dimension,these steps can extract good features without any manually marking.Then multi-Bi-GRUs will process these features and learn to predict or recognize the sequences.The multi-Bi-GRUs enable our model to learn information about current time-step as well as future time-step.Then a fully connected layer and softmax layer will predict the output.This thesis tests the ChineseLipNet model and compare it with human beings,AlexNet model and VGG model.The experimental results show that the accuracy of ChineseLipNet model is significantly higher than that of human lipreading,and better than AlexNet model and VGG model.At the same time,ChineseLipNet model has fewer parameters,shorter training time and faster convergence speed.Therefore,ChineseLipNet model is not only suitable for training large-scale lipreading datasets,but also more suitable for applying to portable terminal equipment for recognition,which has higher application value.

Keywords/Search Tags:

Deep learning, lip recognition, Chinese lipreading, STCNN, Bi-GRU

PDF Full Text Request

Related items

1	The Research On Lipreading Based On Deep Learning And Its Applications
2	Chinese Lipreading Research Based On Deep Learning
3	Chinese Lipreading And Keyword Detection Based On Deep Learning
4	Research And Implementation Of Lipreading Recognition Based On Deep Learning
5	Research On Lipreading Technology Based On Deep Learning
6	Research On Lipreading And Speech Multi-modal Recognition Using Deep Learning
7	Research And Application Of Lip Reading Recognition And Model Compression Based On Deep Learning
8	Lipreading Based On Deep Learning
9	Research And Design Of Lipreading Algorithm Based On Deep Learning
10	Research On OCR Algorithm For Low Quality Chinese Image Based On Deep Learning