Font Size: a A A

Visual Speech Recognition Based On DHMM

Posted on:2011-09-14Degree:MasterType:Thesis
Country:ChinaCandidate:H B ZhangFull Text:PDF
GTID:2178360305955227Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years, there are a growing number of researchers in computer lip-reading techniques, the purpose of the researchers is to supplement the shortage of speech recognition using identification lips. Particularly the voice recognition is easily affected by noises, but the lip-reading techniques are not. We can improve the voice recognition rate by mixing the lip-reading techniques and the speech recognition. Lip language recognition is also very important to human-computer interaction, because lip language recognition makes human-computer interaction more quickly, easily, which can further enhance the interactive capacity of the machines. Another very important practical significance of lip language recognition is to the help the deaf to make convenient interpersonal communication. Lip-reading recognition technology involves a lot of related areas, such as pattern recognition, image processing, etc. Research in related fields of lip-reading technology also needs for an analytical study of them. Lip-reading technique involves face detection technology, lip positioning technology, lip-feature extraction techniques, clustering techniques, feature vector recognition technology and so on.This paper introduces the related background and development status, significance and other contents of lip-reading technology, coverage of the study content and methods. And the lip-reading system is divided into four parts: face detection, feature extraction, feature clustering, and feature recognition.The first step of the lip-reading system is to detect the face and determine the location of lip. We first introduces the data structures of the stored image, then describes the principal means of image processing, then introduces the main algorithm of face detection, and describes the characteristics of them. Finally details of the Haar feature value and the principle of AdaBoost classifier are introduced, and then describe the face detection algorithm based on characteristics of value on the Haar classifier AdaBoost. And we use the open-source tools to test the face detection. After the realization of the human face detection, we also use the same method to achieve the goal of lip detection. The only difference between them is the images. The reason we use the Haar feature value and the AdaBoost-based classifier on face detection algorithm is the high efficiency of this approach and they do not require a priori knowledge. The input images are from our own non-compressed AVI recording video, we read each frame image from the video. As the face detection technology is relatively mature, and the faces are relatively easy to detect, the success rate of face detection is high, close up to 100%. On the contrary, due to the lip movements are highly frequently, the lip detection rate is relatively low. The highest of the detection rate in this article is not lip to 80%. Although it is not high, it can basically meet the requirements in a lab environment.The next job is to extract the lip feature vectors from the images. We first introduces the principal types of feature extraction methods, then introduces the significance that they are representative and then describes their respective advantages and disadvantages. The identification after this step is based on the feature vectors obtained here. Therefore, the accuracy of feature vectors has a direct impact on the recognition results. We use pixel-based approach to achieve feature extraction. We first gray the lip IplImage image of 32*32, and then extract all the pixels as a feature vector. Advantage of pixel-based approach is the ability to save all of the information, all the information of images lip can be retained to play a role in lips recognition. The disadvantage is that the method is sensitive to the light, rotation, translation and other environmental factors, and high dimension. At this point the number of high-dimensional feature vectors is 1024-dimensional, so we need to use the PCA + LDA approach to dealing with dimensionality reduction. The PCA algorithm is to remove the relatively large differences in the characteristics of vectors of the high-dimensional, to retain a small difference in feature vectors to achieve dimensionality reduction. Then we use the LDA dimension reduction algorithm. LDA algorithm is a high-dimensional feature space by the most representative of its characteristics of low-dimensional space to extract so as to achieve dimensionality reduction targets. We use the PCA algorithm and the open-source tools to reduce the dimensions to 32, and re-use of LDA algorithm for dimensionality reduction to 10. And we use the tools to make human face detection, lip detection and extraction of feature vectors. Eigenvectors at this time is based on the lip detection, if the lip of the positioning is error, then the extracted feature vector is wrong, the final identification could not get accurate results.The extracted 10-dimensional feature vector can not directly be used in the DHMM identification, because we need to be classified-type counterparts, and we use K-means clustering algorithm to make the classification. We first introduce the clustering algorithm, as well as several major advantages and disadvantages of each, and then describe the principle of K means algorithm in detail. The so-called clustering is to similar data into one category, lip-lip cluster is to a similar feature vector as a Class. Eigenvectors can be calculated by the distance, so you can use K-Means clustering. The clustering algorithm needs some parameters, and the parameters need to be set manually, so the parameters will have a large impact on the final results. We set the number of category clustering generally 32/64/96/128/160/192/256, which is man-made set. Through experiments we believe that the identification number of the class has a great impact on the results, so the parameters are very important.Finally, we need to build the DHMM model, train the recognition. We first introduce the identification of some common models, for example, model-based matching, nature network model, HMM model. At first, a lot of lip-language recognition using model-based matching, but as well as of the double random characteristics of HMM model, more and more people use the HMM model to study the lip Language Recognition. We introduce the three basic issues of HMM evaluation, decoding problems, learning problems, and describe the solution algorithm. Finally, we introduce the principles and steps in the realization of the DHMM. DHMM training, using of forward and backward algorithm and Baum-Welch algorithm, identifies the main use of the forward algorithm. We finally finish the use of clustering in front of the feature vector DHMM to conduct the training and recognition, the final 10 word recognition rate is 64%. But the study shows that the recognition rate of each word differ each other greatly. Our ten words are wo/lai/zai/zi/yi/xiao/bu/bie/de/shang, and the recognition rate of them are 90%, 80%, 60%, 100%, 40%, 50%, 90%, 20%, 40%, 70%. And we find that as the recognition results of different parameters may have an impact, even a great influence. Since the estimated effects of conditions, we can only identify the outcome of the reasons for lips. We believe that the impact of the main reasons are as following: the process of identification of lip recognition fails, then the feature vectors obtained are wrong, then the identification will be seriously affected; and there are some loss of information in feature dimension reduction, although the lost information is very small, but must have a certain impact on the results; feature vectors optimization and DHMM training is separated process and could not organically link to affect the recognition rate; lack of training data makes the DHMM inadequate training, it is not a very good recognition results; some parameters need to man-made setting, which have a huge impact.In summary, we have established a simple isolated DHMM-based word recognition system. We give a detailed description on the various modules, and the principles and the technology of each module. We conduct a simple experiment, and obtained the 64% on 10 isolated words recognition. Finally, we analyze some of the reasons affecting the recognition rate of lip recognition, to facilitate future system enhancements.
Keywords/Search Tags:human face detection, lip positioning, feature extraction, lip reading, clustering, DHMM
PDF Full Text Request
Related items