Font Size: a A A

Research On Technology Of Lip Reading Fused Physiological Information

Posted on:2019-10-23Degree:MasterType:Thesis
Country:ChinaCandidate:F YangFull Text:PDF
GTID:2428330626452090Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As a bridge between people and computers or other devices,human-computer interaction technology has experienced a significant change from mouse and keyboard to non-contact interaction of multi-modal information under the drive of intelligence technology and demand.As an important non-contact interaction method,lip-reading technology has not only broke through the limitations of application scenarios,assists speech recognition in noisy environments,but also has a broader development prospect with the emergence of three-dimensional sensor.The comprehensive extraction and effective characterization of lip motion information is directly related to the accurate expression of semantic information.The completeness and representation of lip-motion feature extraction directly affect the recognition of semantic content and the judgment of semantic emotion.For lip-motion feature extraction,the common difficulty is that the feature extraction method can't be used as a general method to comprehensively and effectively represent lip-motion information.So,this paper aims to study multimodal lip reading studies integrating facial muscle physiology information.The research content mainly includes Kinect-based multimodal data acquisition,preprocessing,facial muscle model building,muscle model mapping,feature extraction and DenseNet-based training recognition.First,multi-modal information including audio,color image and depth data were collected based on Kinect V2.0 during the lip movement of the speaker.After that,a series of pre-processing operations were performed on the data.For the image data,face detection,lip positioning,and data augmentation were sequentially performed.For the depth data,a series of unconscious head movements such as turning,hoeing,looking up,bowing,etc.during the recording of the speaker are corrected.Then,the paper studied the facial muscle physiological information and establishes a vector muscle model with a small number of parameters.Based on the acquired 1347 facial feature points,the established muscle model was mapped into the three-dimensional facial model.Based on the established muscle model,the paper extracted two types of features,namely geometric feature and physiological feature.Geometric feature includes shape feature and angle feature,physiological feature includes muscle length feature and muscle displacement feature.Finally,the paper used DenseNet for a lip reading experiment.The discovery proved that the addition of depth information can improve the recognition rate of the lip reading system,and the physiological characteristics proposed in the paper can indeed enhance the constraint between three-dimensional discrete points and more fully characterize the lip movement process.In addition,the paper studied the tones and consonants,and found that it is feasible to distinguish tones and constants by only visual information.
Keywords/Search Tags:Lip Reading, Facial Muscles, Physiological Feature, Kinect, DenseNet, Feature Extraction
PDF Full Text Request
Related items