Database Construction And Algorithm Research Of Visual Speech Recognition Based On Deep Learning

Posted on:2020-08-04

Degree:Master

Type:Thesis

Country:China

Candidate:M M Yang

Full Text:PDF

GTID:2428330590950608

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

In recent years,deep learning has achieved remarkable achievements in many fields including computer vision.As one of the most challenging research topics in the field of computer vision,visual speech recogniton aims to learn the corresponding text content by observing the sequence of continuous lip images.However,due to the diversity of lip changes and the richness of the language itself,the difficulty of lip recognition tasks has led to the relatively slow research progress of lip reading.All deep learning algorithms are inseparable from a large amount of data,but the open source datasets in the current academic domain are all based on English.In order to provide a good foundation for future research on Chinese lip reading,The first work here is to build the first Chinese lip reading database LRW-1000,and proposed the complete process and algorithm details of the lip reading database construction.LRW-1000 is the word-level lip reading database with the largest number of categorys and the largest number of speakers.Meanwhile,focus on the difficulty of lip reading task,a new lip reading algorithm is proposed.By improving the existing feature extractor DenseNet,it strengthens the short-term dependency modeling ability of the model,and the learned multi-scale feature provides better robustness to changes in resolution.Considering the difference in the degree of association between different text content and different regions of the face,we introduce a new attention mechanism to assist the network focus on the most obvious relevant regions.On the basis of using only image information,the method proposed in this paper has achieved the best results in the current mainstream lip reading database LRW and GRID:the classification accuracy rate on LRW is 82.73,which exceeds the current best result of1.43%;The wer on the GRID is 12.8%,which is more than the current best result of 9.2%.At the same time,in the self-built Chinese dataset LRW-1000,the proposed algorithm is better than the current mainstream model in performance.

Keywords/Search Tags:

Deep Learning, Lip Reading, Spatio-Temporal Convolution, Recurrent Neural Networks, Attention Mechanism

PDF Full Text Request

Related items

1	Image Captioning Based On Deep Recurrent Convlution Network And Spatio-temporal Information Fusion
2	Research On Prediction Of Individual Movement Behavior At Urban Scale Based On Deep Learning
3	Human Action Recognition Based On Spatio-temporal Network And Attention Mechanism
4	Video Action Recognition Based On 2D Convolution Network Under Spatio-Temporal Feature Enhancement Mechanism
5	Research On Video Event Recognition Using Deep Network Spatio-temporal Consistency
6	Research And Design Of Lip Reading Based On 3D Convolution
7	Research On Feature Fusion Method Of Speech Emotion Recognition Based On Deep Learning
8	Research And Implementation Of Lip-reading System Based On Deep Learning
9	Question Classification Based On Deep Learning Model
10	Researches For Lip Reading Based On Lightweight Convolution And Attention Mechanism