The Election, Alignment And Fusion Based On Static-dynamic Multisource Features In Lipreading

Posted on:2011-04-03

Degree:Master

Type:Thesis

Country:China

Candidate:F Yang

Full Text:PDF

GTID:2198330338479951

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Recently Lipreading and lip-movement, as a new kind of intelligent human-computer interaction technology, has being widely used in many practical application systems. Studies on the lip-movement mainly focus on two aspects: speaker identification recognition and speaking content recognition. This article focuses on improving the performance of the latter. Reflecting the complex process of articulation by simple lip video clips requires researchers to present effective feature extraction methods which can describe the information embedded in the lip video clips. However, the lip video clips contain a large number of identity-related information. The representation of this kind of information can not improve the performance of the lipreading system. In contrast, it will degrade the accuracy and robustness of the lipreading system. On the other hand, the identity-unrelated information also can be confused, un-unified, and appears in different image layers. How to accurately extract useful information embedded in the lip video clips is the starting point of this study.In order to overcome the complexity of the lipreading, this paper presents to use multisource information with different types and characteristics to describe the lip video clips. This article introduces several static features such as LBP, HOG, Gabor features to describe the static information from different levels. On other hand, compared with other pattern recognition problems, lipreading contains more dynamic information. This paper proposes the concept of rich information frame. It is measured by the accumulated dynamic information of the lip video clip. Based on this concept, we also introduce optical flow to extract dynamic information in the video. However, multisource features have different dimension, different type and structure of information. In order to make them work complementarily, the features must be aligned. In this paper, we propose two criteria for multisource features alignment. Based on the two criteria, we introduce the multisource features alignment methods, for example, considering two-source feature. Furthermore, we propose the framework of multisource features alignment and fusion and two fusion strategies for LBP, HOG, Gabor and optical flow features. Finally, we compare the performances of different methods that use our proposed multisource features and the current mainstream features, and the experimental results are analyzed. The framework of multisource features alignment and fusion proposed in the paper is extendable and do not restrict the number and kinds of features, which point out a new path for multiple features working complementarily. In addition, with different kinds of features, this framework can be easily used in other pattern recognition applications.

Keywords/Search Tags:

lipreading, multisource features, dynamic feature, feature alignment, feature fusion

PDF Full Text Request

Related items

1	The Election, Alignment And Fusion Based On Static-dynamic Multisource Features In Lipreading
2	Research On Multisource Image Fusion Algorithm And Its Applications Based On Statistics And Reasoning
3	Person Re-identification Across Cameras Based On Multi-features Fusion
4	Research On Person Re-identification With Local Features Fusion Based On Deep Learning
5	The Research Of Pedestrian Detection Method Based On Multi-Features Fusion In Static Image
6	Feature Extraction, Selection And Combination In Lipreading
7	Research On Face Identification Based On Multi-features Fusion
8	The Key Techniques In Lipreading
9	Face Detection Based On Fusion Of Multiple Features With Cascade Support Vector Machines
10	Lipreading Technology Based On Lip Motion Feature