Font Size: a A A

Research On Feature Extraction Technology Of Lip Motion Information

Posted on:2020-08-30Degree:MasterType:Thesis
Country:ChinaCandidate:K GuFull Text:PDF
GTID:2428330575478094Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Lip reading refers to the task of identifying the speaker based on the spoken utterances.In recent years,mobile Internet technology is in the stage of rapid development,and the application cases have shown an obvious upward trend,which has promoted the development of human-computer interaction and artificial intelligence.This paper processes the lips of discourse content by the visual features of the mouse.Traditional feature extraction is only for static information,resulting in a waste of dynamic information between frames.In this paper,a novel approach for recognizing Dynamic Texture(DT)is proposed.Its simplifications and extensions of image analysis are also considered.First,textures are modeled with volume local binary patterns(VLBP),combining motion and appearance.In order to make the algorithm low in complexity and generalization,this paper extends the algorithm to space-time domain and multi-dimensional space.We use spatial texture and time motion information to model the lip motion scene,only consider the co-occurrences of the local binary patterns on five intersection plane.The advantages of our approach include local processing,robustness to monotonic gray-scale changes,and simple computation.Then,dynamic texture features extracted in the previous step are input to the Stacked Sparse Autoencoder(SSAE).SSAE uses greedy unsupervised learning to extract high-level features.At last,we consider all of layers of the SSAE as a whole,use Back Propagation(BP)and supervised learning algorithms for fine-tuning and input the extracted features into the SOFTMAX classifier.The convex function optimization problem is effectively solved by improving the activation range of the activation function.The resulting SSAE is trained into multiple perceptrons that perform hierarchical feature extraction and data classification tasks.Experiments show that the proposed model can effectively predict lip movement image sequences.In English recognition tasks,compared with traditional models,the improved algorithm has good similarity within classes and good distinction between classes.The overall system runs stably and has strong robustness to environmental changes such as illumination.Higher recognition results are obtained on the self-made database.
Keywords/Search Tags:Lip reading, dynamic Information, high-level features, SSAE
PDF Full Text Request
Related items