Online Sentence-Level Lip Reading Recognition Based On Video Convolutional Nerual Networks

Posted on:2021-03-03

Degree:Master

Type:Thesis

Country:China

Candidate:L Liu

Full Text:PDF

GTID:2428330614965726

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Reading lip movements of video characters is a challenging data analysis topic in the field of pattern recognition.The main task is to apply methods such as improved convolutional neural networks in deep learning,time series prediction,and probabilistic modeling in serialized character lip data,and to identify the content of the sentences spoken by the video characters based on the extraction of lip movement information.At present,various recognition algorithms are gradually expanding and extending in terms of video processing and mining.For the dynamic character analysis in videos,especially the research on the alignment of lip movement information and sentence text,further exploration is needed.This thesis aims to effectively identify the lip movements of individual characters in videos.Firstly,a lip reading recognition data set is built by self to train the character videos and the corresponding sentence text label sequences in multiple scenes,and then a semantic extraction method for lips of video characters based on convolutional neural networks is designed to achieve multi-level extraction of lip region division and lip features,and finally an online sentence-level lip reading recognition method for video characters based on time series prediction is designed to complete the process of associating and aligning characters' lip movements and sentence sequences,as well as the online recognition and display process.The work innovation of this thesis is mainly reflected in the following three aspects:(1)A large number of program videos with speech characters are collected and preprocessed.Combined with three-stage depth separable convolutional neural networks and non-maximum suppression improved algorithm,the face is detected and tracked continuously by Kalman filter.The video frame sequences with faces and text labels corresponding to audio are added to the training to complete the establishment of local lip reading recognition data set.(2)The K-means clustering is used to divide the roughly selected lip region,and the lip candidate boxes are obtained through fully convolutional networks,and the lip feature semantics of video characters under multi-level convolution are extracted by the residual networks which integrates the spatiotemporal and multi-channel information.(3)The key information of the speech sentence content in the video sequences is memorized by bidirectional gated recurrent unit,and the connectionist temporal classification algorithm based on hybrid attention mechanism is introduced to align the text labels with the characters in the sentences,so as to synchronize the lip movement contour.The sentence sequences of lip reading recognition is displayed online by combining the web frame and cloud storage platform.

Keywords/Search Tags:

Convolutional Neural Networks, Lip Reading Recognition, Online, Sentence-Level, Semantic Extraction

PDF Full Text Request

Related items

1	Research On Sentence Classification Model Based On Deep Feature Extraction
2	Research And Design Of Lip Reading Based On 3D Convolution
3	Research On Key Techniques Of Relation Extraction For Text Data
4	Patent Document Semantic Retrieval Based On Character Convolutional Neural Network
5	Research On Recognition Algorithms For Continuous Sign Language Sentence Based On Kinect
6	Research On Lip Reading Recognition Based On Deep Learning
7	Static Gesture Recognition In Complex Background Based On Convolutional Neural Network
8	On-Line Handwritten Chinese Character Recognition Approach Based On Sentence Level
9	On-line Handwritten Chinese Character Recognition Approach Based On Sentence Level
10	Research On Opinion Object Extraction For Online Hot News Reviews