Font Size: a A A

Database Construction And Algorithm Research Of Visual Speech Recognition Based On Deep Learning

Posted on:2020-08-04Degree:MasterType:Thesis
Country:ChinaCandidate:M M YangFull Text:PDF
GTID:2428330590950608Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,deep learning has achieved remarkable achievements in many fields including computer vision.As one of the most challenging research topics in the field of computer vision,visual speech recogniton aims to learn the corresponding text content by observing the sequence of continuous lip images.However,due to the diversity of lip changes and the richness of the language itself,the difficulty of lip recognition tasks has led to the relatively slow research progress of lip reading.All deep learning algorithms are inseparable from a large amount of data,but the open source datasets in the current academic domain are all based on English.In order to provide a good foundation for future research on Chinese lip reading,The first work here is to build the first Chinese lip reading database LRW-1000,and proposed the complete process and algorithm details of the lip reading database construction.LRW-1000 is the word-level lip reading database with the largest number of categorys and the largest number of speakers.Meanwhile,focus on the difficulty of lip reading task,a new lip reading algorithm is proposed.By improving the existing feature extractor DenseNet,it strengthens the short-term dependency modeling ability of the model,and the learned multi-scale feature provides better robustness to changes in resolution.Considering the difference in the degree of association between different text content and different regions of the face,we introduce a new attention mechanism to assist the network focus on the most obvious relevant regions.On the basis of using only image information,the method proposed in this paper has achieved the best results in the current mainstream lip reading database LRW and GRID:the classification accuracy rate on LRW is 82.73,which exceeds the current best result of1.43%;The wer on the GRID is 12.8%,which is more than the current best result of 9.2%.At the same time,in the self-built Chinese dataset LRW-1000,the proposed algorithm is better than the current mainstream model in performance.
Keywords/Search Tags:Deep Learning, Lip Reading, Spatio-Temporal Convolution, Recurrent Neural Networks, Attention Mechanism
PDF Full Text Request
Related items