Font Size: a A A

Detecting And Processing Visual Information In Speech Synthesis System Driven By Visual-speech

Posted on:2008-01-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:M J WangFull Text:PDF
GTID:1118360272485477Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
In order to develop a communication approach for voice-impaired people, a speech synthesis system driven by visual speech is approached. The visual information of lip-movement from the mouth region is used as a special language in this system. In this research, some fundamental problems are explored, such as how to correlate the visual information with sound information, how much information can be extracted from the lip region and lip contours, how much the parameters of the lip features can contribute to a robust speechreading system, and what is the effective proceeding to extract lip parameters automatically.The main research content of the dissertation involves:1. Based on analyzing the frontal-view face image and profile-view face image, a new model, which can extract the degree of pouting from it, is presented. At the same time, the differential coefficient of some parameters to describe dynamic characteristic of the lip contour are calculated. Experimental results based on a small database of Chinese words show that the parameters from unsymmetrical lip contour model improved the recognizing rate in more than 25%. Then using this model, a mandarin Chinese visual-speech database is designed for voice-impaired people.2. Movement detection and morphological processing are used to extract mouth area and lip contours from the image sequences. Then the lip features is extracted from the mouth region; including the projection of the width of the outer lip contourW , the height of the outer lip contourH , and the projection of the poutingF . The difference of these parameters are calculated as new parameters to describe the dynamic information of the lip, including dW /dt , dH /dt and dF /dt .3. Discrete Fourier Transform and Discrete Cosine Transform are used to get the descriptors of lip contours in the unsymmetrical lip contours model automatically. Hidden Markov Model is trained by using both of the descriptors as the eigenvector of lip contours, and then recognition ability is tested.4. Feature fusion is used to improve the classifiable power. To get better effect of combination, weighting combination is used to form the parts of with balance. Geometrical features of lip region and the descriptors of lip contours by Discrete Cosine Transform are combined to get a new discriminate vector. With this new vector, the HMM model is used to training and recognizing. The recognition rate is analyzed with different weighting factors.5. Second-order Hidden Markov Model is used and implemented to train and test the lip's feature sequences, which can capture more context information from the lip's feature sequences, and it fits for the pronunciation of Chinese. The accuracy of recognition rates by both second-order Hidden Markov Model and first-order Hidden Markov Model are tested with the same lip's feature sequences.
Keywords/Search Tags:rehabilitation, speechreading, unsymmetrical lip contour model, movement detection, orthogonal transforms, feature fusion, Hidden Markov Model
PDF Full Text Request
Related items