Font Size: a A A

Research On 3D Visible Speech Animation Driven By Prosody Text

Posted on:2009-09-15Degree:MasterType:Thesis
Country:ChinaCandidate:S G ZhangFull Text:PDF
GTID:2178360242994218Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Synthesis of realistic, accurate visible speech animation is one of the most interesting yet difficult research areas that have many applications within virtual characters, including improved intelligibility of speech in noisy environments and learning, training and even certain types of therapy. With a machine learning method, a segment of speech animation can be driven by a clip of sound and its lip motions were captured synchronously. One of the advantages in that approach is the prosodic information is impliedly contained in the driven data. So the lip-synchs can be various.In our synthesizing system, 3D speech animation is driven by a kind of prosody text. Due to the nature of legibility, easily modified and high compression ratio, texts are widely used on the Internet. But the common texts are too simple to take the information of tone, duration, and emphasis which could be easily extracted from recorded voice and difficult to get by analyzing the text. Lacking of prosody information, synthesis systems often used in the past appear inflexible and stiff.In this paper, we focus on a novel text driven mechanism for generating smart three-dimensional speech animation. The basic idea is to synthesize the animated faces using prosodic information edited by user with a kind of markup language. Our system transfers these tags to parameters of control trajectory, so lip varieties can be shown in the continuous animation. The proposed technique utilizes the performance-driven approach to generate 3D dynamic viseme with a new scatter data interpolation algorithm which cause low synthesis error. A Chinese prosody markup language (CPML) is defined to support the existent prosody academic results which change common text to prosodic ones. By analyzing the uttering feature extracted from raw video, we build up a parametric model based on the exponential formula. It takes the pre-obtained 3D dynamic visemes and prosodic information as input data, and outputs a segment of vivid speech animation. Experimental results show that (1) the proposed technique synthesizes animation of different effects depending on the availability with the prosodic information, and (2) the new technique produces realistic results using less data than the conventional methods.True and accurate synthesis of visual sppech has been a difficult and interesting field research of virtual human animation. Animation of 3D visible speech takes a great deal of theoretical significance and value. Able to take advantage of valuable network resources and applications to the environment is not confined to a PC, including PDA, and other mobile devices. The technology has broad application, can be used for a variety of occasions, such as: teaching the deaf, sign language page director, and other fields, for the production of accurate population-based animation significantly reduce the burden of manual labor.
Keywords/Search Tags:visible speech synthesis, facial animation, prosody model, Chinese prosody markup language
PDF Full Text Request
Related items