Font Size: a A A

The Research And Realization Of Text-To-Visual Speech Synthesis System

Posted on:2006-06-02Degree:MasterType:Thesis
Country:ChinaCandidate:R FuFull Text:PDF
GTID:2168360155957960Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Combining facial information with audio speech, Text-To-Visual Speech Synthesis System(TTVS) builds multi-modal human-computer interface,which has greatly improved the way theuser interacts with computer and has been becoming an active research field in recent years. Inorder to develop a more realistic TTVS, which is easy to apply to web and embeddedsurroundings, our studying includes the following aspects:Firstly, In order to create an animation mechanism driven by FAPs, based on the theory offacial animation in MPEG-4, we developed an editor for producing animation data on a standard2D face model. Secondly, by improving the original the method of Active Shade Model fordetecting the feature points on human face, we completed a human face-fitting tool only usingone human face picture. By this tool, we can apply our animation mechanism to the arbitraryhuman face. Thirdly, in the synthesis of speech animation, we proposed the concept of dynamicsyllable viseme, by which the texts was mapped to the corresponding syllable visemes. After that,we merged the expression into each syllable visemes adjusted by speech prosodic rules. In theend, by mending Hermite interpolation, each syllable was joined to create the facial speechanimation with human facial expression and speech rhythm. Fourthly, based on KD2000 by iFlyCooperation, TTVS that can synthesize a human face with synchronous audio speech wasdeveloped successfully. Finally, Adding the TTVS function to a web-chat system, weimplemented web-speech-animation-chat software.Difference with the former TTVS, the TTVS developed in the thesis is easy to apply to weband embedded surroundings, it has characteristics such as small data, easy to transplant, realtime.This study is supported by the Project of Chinese National Science Foundation: Research ofthe synchronization among virtual human multi-modal behaviors.
Keywords/Search Tags:Text-To-Visual Speech Synthesis System, MPEG-4, Human face fitting, dynamic syllable viseme, the method to adjust the syllable viseme based on speech prosodic rules
PDF Full Text Request
Related items