The Research And Realization Of Text-To-Visual Speech Synthesis System

Posted on:2006-06-02

Degree:Master

Type:Thesis

Country:China

Candidate:R Fu

Full Text:PDF

GTID:2168360155957960

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Combining facial information with audio speech, Text-To-Visual Speech Synthesis System(TTVS) builds multi-modal human-computer interface,which has greatly improved the way theuser interacts with computer and has been becoming an active research field in recent years. Inorder to develop a more realistic TTVS, which is easy to apply to web and embeddedsurroundings, our studying includes the following aspects:Firstly, In order to create an animation mechanism driven by FAPs, based on the theory offacial animation in MPEG-4, we developed an editor for producing animation data on a standard2D face model. Secondly, by improving the original the method of Active Shade Model fordetecting the feature points on human face, we completed a human face-fitting tool only usingone human face picture. By this tool, we can apply our animation mechanism to the arbitraryhuman face. Thirdly, in the synthesis of speech animation, we proposed the concept of dynamicsyllable viseme, by which the texts was mapped to the corresponding syllable visemes. After that,we merged the expression into each syllable visemes adjusted by speech prosodic rules. In theend, by mending Hermite interpolation, each syllable was joined to create the facial speechanimation with human facial expression and speech rhythm. Fourthly, based on KD2000 by iFlyCooperation, TTVS that can synthesize a human face with synchronous audio speech wasdeveloped successfully. Finally, Adding the TTVS function to a web-chat system, weimplemented web-speech-animation-chat software.Difference with the former TTVS, the TTVS developed in the thesis is easy to apply to weband embedded surroundings, it has characteristics such as small data, easy to transplant, realtime.This study is supported by the Project of Chinese National Science Foundation: Research ofthe synchronization among virtual human multi-modal behaviors.

Keywords/Search Tags:

Text-To-Visual Speech Synthesis System, MPEG-4, Human face fitting, dynamic syllable viseme, the method to adjust the syllable viseme based on speech prosodic rules

PDF Full Text Request

Related items

1	Research And Implementation On Chinese Text-To-Visual Speech Synthesis System (TTVS)
2	An Approach For Driving Cartoon Animation By Combination Of Speech And Text And Its Implementation On Moving Entertainment
3	Visual Speech Synthesis Technology And Its Application Studies In English Pronunciation Tutoring
4	Research On Key Technologies Of Tibetan Text-to-speech Conversion System
5	A Study On Speech Driven Human Face Modeling And Animation
6	The Research Of Prosodic Control Algorithm And Realization For Chinese Speech Synthesis
7	Text Analysis Of Burmese Language For Speech Synthesis
8	The representation of prosodic and syllabic structure in speech production
9	Malay Text Analysis For Speech Synthesis
10	Mandarin Syllable Recognition System Based On Asat Frame