Research On 3D Visible Speech Animation Driven By Prosody Text

Posted on:2009-09-15

Degree:Master

Type:Thesis

Country:China

Candidate:S G Zhang

Full Text:PDF

GTID:2178360242994218

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Synthesis of realistic, accurate visible speech animation is one of the most interesting yet difficult research areas that have many applications within virtual characters, including improved intelligibility of speech in noisy environments and learning, training and even certain types of therapy. With a machine learning method, a segment of speech animation can be driven by a clip of sound and its lip motions were captured synchronously. One of the advantages in that approach is the prosodic information is impliedly contained in the driven data. So the lip-synchs can be various.In our synthesizing system, 3D speech animation is driven by a kind of prosody text. Due to the nature of legibility, easily modified and high compression ratio, texts are widely used on the Internet. But the common texts are too simple to take the information of tone, duration, and emphasis which could be easily extracted from recorded voice and difficult to get by analyzing the text. Lacking of prosody information, synthesis systems often used in the past appear inflexible and stiff.In this paper, we focus on a novel text driven mechanism for generating smart three-dimensional speech animation. The basic idea is to synthesize the animated faces using prosodic information edited by user with a kind of markup language. Our system transfers these tags to parameters of control trajectory, so lip varieties can be shown in the continuous animation. The proposed technique utilizes the performance-driven approach to generate 3D dynamic viseme with a new scatter data interpolation algorithm which cause low synthesis error. A Chinese prosody markup language (CPML) is defined to support the existent prosody academic results which change common text to prosodic ones. By analyzing the uttering feature extracted from raw video, we build up a parametric model based on the exponential formula. It takes the pre-obtained 3D dynamic visemes and prosodic information as input data, and outputs a segment of vivid speech animation. Experimental results show that (1) the proposed technique synthesizes animation of different effects depending on the availability with the prosodic information, and (2) the new technique produces realistic results using less data than the conventional methods.True and accurate synthesis of visual sppech has been a difficult and interesting field research of virtual human animation. Animation of 3D visible speech takes a great deal of theoretical significance and value. Able to take advantage of valuable network resources and applications to the environment is not confined to a PC, including PDA, and other mobile devices. The technology has broad application, can be used for a variety of occasions, such as: teaching the deaf, sign language page director, and other fields, for the production of accurate population-based animation significantly reduce the burden of manual labor.

Keywords/Search Tags:

visible speech synthesis, facial animation, prosody model, Chinese prosody markup language

PDF Full Text Request

Related items

1	Chinese Sign Language Synthesis Driven By Speech And Text
2	A Research Of Prosody Modeling And Synthesis Method In Chinese TTS
3	The Research On Dai Prosody Prediction Module Of Speech Synthesis
4	Prosody Extraction And Description Of Chinese Mandarin Continuous Speech
5	Vietnam Chinese Language Conversion Technology Research
6	The Research Of Speech Synthesis And Prosody Control In Wu-Dialect Text-to-Speech
7	Research On Chinese Speech Synthesis Method Integrating Pause And Personal Information
8	Decoding speech prosody: Do musicians have heightened sensitivity to emotion in speech
9	Multi-level Prosody And Short-term Spectrum Transform For Emotional Speech Synthesis
10	An Improved Speech Synthesis Method