Font Size: a A A

The Election, Alignment And Fusion Based On Static-dynamic Multisource Features In Lipreading

Posted on:2011-04-03Degree:MasterType:Thesis
Country:ChinaCandidate:F YangFull Text:PDF
GTID:2198330338479951Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Recently Lipreading and lip-movement, as a new kind of intelligent human-computer interaction technology, has being widely used in many practical application systems. Studies on the lip-movement mainly focus on two aspects: speaker identification recognition and speaking content recognition. This article focuses on improving the performance of the latter. Reflecting the complex process of articulation by simple lip video clips requires researchers to present effective feature extraction methods which can describe the information embedded in the lip video clips. However, the lip video clips contain a large number of identity-related information. The representation of this kind of information can not improve the performance of the lipreading system. In contrast, it will degrade the accuracy and robustness of the lipreading system. On the other hand, the identity-unrelated information also can be confused, un-unified, and appears in different image layers. How to accurately extract useful information embedded in the lip video clips is the starting point of this study.In order to overcome the complexity of the lipreading, this paper presents to use multisource information with different types and characteristics to describe the lip video clips. This article introduces several static features such as LBP, HOG, Gabor features to describe the static information from different levels. On other hand, compared with other pattern recognition problems, lipreading contains more dynamic information. This paper proposes the concept of rich information frame. It is measured by the accumulated dynamic information of the lip video clip. Based on this concept, we also introduce optical flow to extract dynamic information in the video. However, multisource features have different dimension, different type and structure of information. In order to make them work complementarily, the features must be aligned. In this paper, we propose two criteria for multisource features alignment. Based on the two criteria, we introduce the multisource features alignment methods, for example, considering two-source feature. Furthermore, we propose the framework of multisource features alignment and fusion and two fusion strategies for LBP, HOG, Gabor and optical flow features. Finally, we compare the performances of different methods that use our proposed multisource features and the current mainstream features, and the experimental results are analyzed. The framework of multisource features alignment and fusion proposed in the paper is extendable and do not restrict the number and kinds of features, which point out a new path for multiple features working complementarily. In addition, with different kinds of features, this framework can be easily used in other pattern recognition applications.
Keywords/Search Tags:lipreading, multisource features, dynamic feature, feature alignment, feature fusion
PDF Full Text Request
Related items