Font Size: a A A

The Study On Key Technologies Of Realistic Chinese Visual Speech Synthesis

Posted on:2011-06-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:H ZhaoFull Text:PDF
GTID:1118360308985637Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Visual speech synthesis can be called speech animation. Visual speech synthesis technology is to synthesize visual image sequence according to the given text or speech, which can deepen people's language comprehension. Visual speech synthesis technology plays important roles on domains of human-computer interaction, movie and entertainment, information countermeasure and so on.A large-scale Chinese bimodal database is designed and a mouth segmentation approach in color image with noise is proposed. Based on them, several realistic Chinese visual speech synthesis approaches are proposed in this dissertation. Also, a demonstration system is designed, in which visual speech synthesis is the key technology. The experimental results show that, aiming at information spoofing, the proposed visual speech approaches is fast, exact and efficient. The main contents of this dissertation are summarized as follows:In order to get mouth area in color image with noise, a thresholding segmentation algorithm based on peak clustering tendency test is proposed. The thresholding segmentation algorithm is composed of two algorithms: parallel projection segmentation algorithm and weighting Fuzzy c-Means clustering algorithm based on histogram. Parallel projection segmentation algorithm is used to project two-dimension histogram into one-dimension histogram according to mapping rule, and the algorithm is proved to satisfy the accuracy of two-dimension histogram segmentation approach and real-time performance of one-dimension histogram. The experimental results show that accuracy of mouth segmentation is high, which is able to provide accurate mouth coordinate information. Meanwhile, the proposed approach can be used to select mouth corpus for bimodal database.A large-scale Chinese bimodal database -- Bi-VSSDatabase is designed. Original corpus selection rule and the composed document naming rule are made; Mouth feature parameter clustering approach based on artificial immune system is proposed; Chinese triphone model is built, which can reflect Chinese coarticulation characteristics. Based on Chinese triphone model, bimodal corpus selection algorithm is proposed; Then, bimodal corpus marking and segmentation approach is designed. Several statistical indicators, such as coverage rate, coverage efficiency, are calculated. Experimental results of these statistical indicators show that Bi-VSSDatabase is able to provide sufficient, exact and representative bimodal corpus for realistic Chinese visual speech synthesis.Three speech-driven visual speech synthesis approaches are proposed: hidden Markov model (HMM) state synthesis approach, mixing parameter synthesis approach and two-layer HMM synthesis approach. Two text-driven visual speech synthesis approaches are proposed, which are based on HMM and unit concatenation separately. In the unit concatenation synthesis approach, concatenating unit searching procedure is designed and concatenating rule is made. Chinese visual triphone and Chinese dynamic viseme are used as basic unit in training and synthesizing stage separately. Subjective and objective assessment scores of synthesized mouth sequence based on visual triphone are satisfactory. The assessment results prove that the proposed approaches can synthesize smooth, continuous, and satisfactory mouth sequence. After mouth sequence has stitched into background video, a mouth area inpainting approach based on fast marching method is proposed. With the help of painting procedure, a complete, natural and fluent talking-head video is synthesized.Based on improved product HMM, a visual speech quality objective assessment approach is proposed. The assessment approach can simulate people's visual and auditory perception to the speaker and provide objective assessment result. In the assessment process, all the proposed visual speech synthesis approaches are compared. The comparison results prove that the proposed visual speech synthesis technology could highly enhance people's capability of speech comprehension, especially for the people with impaired hearing.
Keywords/Search Tags:Visual speech synthesis, Clustering tendency test, Bimodal database, Chinese visual triphone, Hidden Markov model, Uint concatenation, Objective assessment
PDF Full Text Request
Related items