Research On Feature Extraction Technology Of Lip Motion Information

Posted on:2020-08-30

Degree:Master

Type:Thesis

Country:China

Candidate:K Gu

Full Text:PDF

GTID:2428330575478094

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

Lip reading refers to the task of identifying the speaker based on the spoken utterances.In recent years,mobile Internet technology is in the stage of rapid development,and the application cases have shown an obvious upward trend,which has promoted the development of human-computer interaction and artificial intelligence.This paper processes the lips of discourse content by the visual features of the mouse.Traditional feature extraction is only for static information,resulting in a waste of dynamic information between frames.In this paper,a novel approach for recognizing Dynamic Texture(DT)is proposed.Its simplifications and extensions of image analysis are also considered.First,textures are modeled with volume local binary patterns(VLBP),combining motion and appearance.In order to make the algorithm low in complexity and generalization,this paper extends the algorithm to space-time domain and multi-dimensional space.We use spatial texture and time motion information to model the lip motion scene,only consider the co-occurrences of the local binary patterns on five intersection plane.The advantages of our approach include local processing,robustness to monotonic gray-scale changes,and simple computation.Then,dynamic texture features extracted in the previous step are input to the Stacked Sparse Autoencoder(SSAE).SSAE uses greedy unsupervised learning to extract high-level features.At last,we consider all of layers of the SSAE as a whole,use Back Propagation(BP)and supervised learning algorithms for fine-tuning and input the extracted features into the SOFTMAX classifier.The convex function optimization problem is effectively solved by improving the activation range of the activation function.The resulting SSAE is trained into multiple perceptrons that perform hierarchical feature extraction and data classification tasks.Experiments show that the proposed model can effectively predict lip movement image sequences.In English recognition tasks,compared with traditional models,the improved algorithm has good similarity within classes and good distinction between classes.The overall system runs stably and has strong robustness to environmental changes such as illumination.Higher recognition results are obtained on the self-made database.

Keywords/Search Tags:

Lip reading, dynamic Information, high-level features, SSAE

PDF Full Text Request

Related items

1	A probablistic framework for mapping audio-visual features to high-level semantics in terms of concepts and context
2	Features Extraction And Recognition Of Lip-reading Based On The Inner Lip
3	Video Retrieval System Based On MPEG-7 Low-level Features
4	Research On Image Semantic Classification Techniques In Content-Based Image Retrieval
5	Research On Scene Classification Of Mid-Level Features
6	High-Level Video Semantic Concept Detection Based On Multiple Features
7	Research On Cross-Level Saliency Detection Based On High-Level Semantic Feature
8	A Breast Ultrasound Classification Method Based On High-level Semantic Features Mapping
9	A Study On Reading Difference Between Different Reading Medium
10	Medical Image Retrieval Based On Low Level Features And Semantic Features