Font Size: a A A

Online Sentence-Level Lip Reading Recognition Based On Video Convolutional Nerual Networks

Posted on:2021-03-03Degree:MasterType:Thesis
Country:ChinaCandidate:L LiuFull Text:PDF
GTID:2428330614965726Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Reading lip movements of video characters is a challenging data analysis topic in the field of pattern recognition.The main task is to apply methods such as improved convolutional neural networks in deep learning,time series prediction,and probabilistic modeling in serialized character lip data,and to identify the content of the sentences spoken by the video characters based on the extraction of lip movement information.At present,various recognition algorithms are gradually expanding and extending in terms of video processing and mining.For the dynamic character analysis in videos,especially the research on the alignment of lip movement information and sentence text,further exploration is needed.This thesis aims to effectively identify the lip movements of individual characters in videos.Firstly,a lip reading recognition data set is built by self to train the character videos and the corresponding sentence text label sequences in multiple scenes,and then a semantic extraction method for lips of video characters based on convolutional neural networks is designed to achieve multi-level extraction of lip region division and lip features,and finally an online sentence-level lip reading recognition method for video characters based on time series prediction is designed to complete the process of associating and aligning characters' lip movements and sentence sequences,as well as the online recognition and display process.The work innovation of this thesis is mainly reflected in the following three aspects:(1)A large number of program videos with speech characters are collected and preprocessed.Combined with three-stage depth separable convolutional neural networks and non-maximum suppression improved algorithm,the face is detected and tracked continuously by Kalman filter.The video frame sequences with faces and text labels corresponding to audio are added to the training to complete the establishment of local lip reading recognition data set.(2)The K-means clustering is used to divide the roughly selected lip region,and the lip candidate boxes are obtained through fully convolutional networks,and the lip feature semantics of video characters under multi-level convolution are extracted by the residual networks which integrates the spatiotemporal and multi-channel information.(3)The key information of the speech sentence content in the video sequences is memorized by bidirectional gated recurrent unit,and the connectionist temporal classification algorithm based on hybrid attention mechanism is introduced to align the text labels with the characters in the sentences,so as to synchronize the lip movement contour.The sentence sequences of lip reading recognition is displayed online by combining the web frame and cloud storage platform.
Keywords/Search Tags:Convolutional Neural Networks, Lip Reading Recognition, Online, Sentence-Level, Semantic Extraction
PDF Full Text Request
Related items