Detecting And Processing Visual Information In Speech Synthesis System Driven By Visual-speech

Posted on:2008-01-28

Degree:Doctor

Type:Dissertation

Country:China

Candidate:M J Wang

Full Text:PDF

GTID:1118360272485477

Subject:Biomedical engineering

Abstract/Summary:

PDF Full Text Request

In order to develop a communication approach for voice-impaired people, a speech synthesis system driven by visual speech is approached. The visual information of lip-movement from the mouth region is used as a special language in this system. In this research, some fundamental problems are explored, such as how to correlate the visual information with sound information, how much information can be extracted from the lip region and lip contours, how much the parameters of the lip features can contribute to a robust speechreading system, and what is the effective proceeding to extract lip parameters automatically.The main research content of the dissertation involves:1. Based on analyzing the frontal-view face image and profile-view face image, a new model, which can extract the degree of pouting from it, is presented. At the same time, the differential coefficient of some parameters to describe dynamic characteristic of the lip contour are calculated. Experimental results based on a small database of Chinese words show that the parameters from unsymmetrical lip contour model improved the recognizing rate in more than 25%. Then using this model, a mandarin Chinese visual-speech database is designed for voice-impaired people.2. Movement detection and morphological processing are used to extract mouth area and lip contours from the image sequences. Then the lip features is extracted from the mouth region; including the projection of the width of the outer lip contourW , the height of the outer lip contourH , and the projection of the poutingF . The difference of these parameters are calculated as new parameters to describe the dynamic information of the lip, including dW /dt , dH /dt and dF /dt .3. Discrete Fourier Transform and Discrete Cosine Transform are used to get the descriptors of lip contours in the unsymmetrical lip contours model automatically. Hidden Markov Model is trained by using both of the descriptors as the eigenvector of lip contours, and then recognition ability is tested.4. Feature fusion is used to improve the classifiable power. To get better effect of combination, weighting combination is used to form the parts of with balance. Geometrical features of lip region and the descriptors of lip contours by Discrete Cosine Transform are combined to get a new discriminate vector. With this new vector, the HMM model is used to training and recognizing. The recognition rate is analyzed with different weighting factors.5. Second-order Hidden Markov Model is used and implemented to train and test the lip's feature sequences, which can capture more context information from the lip's feature sequences, and it fits for the pronunciation of Chinese. The accuracy of recognition rates by both second-order Hidden Markov Model and first-order Hidden Markov Model are tested with the same lip's feature sequences.

Keywords/Search Tags:

rehabilitation, speechreading, unsymmetrical lip contour model, movement detection, orthogonal transforms, feature fusion, Hidden Markov Model

PDF Full Text Request

Related items

1	Research And Application Of Audio-video Information Fusion Method
2	Video Event Detection Method Research Based On The Hidden Markov Model
3	Studies On Unusual Event Detection In Video
4	Basic Action Recognition Based On Deep Learning And Hidden Markov Model
5	Text Classification Based On Hidden Markov Model And Semantic Fusion
6	Hidden Markov Model And Its Application In Mechanical Faults Pattern Recognition
7	Research On Face Recognition Technology
8	Study On Abnormal Detection Of Elder Travel Behavior Based On Hidden Markov Model
9	A Study On Hidden Markov Model Based Empty-Nest Home Security Monitoring
10	Research On Multiresolution Hidden Markov Model For Image Denoising