Research And Implementation On Chinese Text-To-Visual Speech Synthesis System (TTVS)

Posted on:2011-10-27

Degree:Master

Type:Thesis

Country:China

Candidate:C Z Rong

Full Text:PDF

GTID:2178360305451810

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

With the development of computer technology and other related subjects, the speech synthesis technique progressed a lot and there emerge a lot of new theories and technologies. People have put forward higher requirements for speech synthesis. As we know, the understanding of human speech is multi-modal. In many occasions, we do not just listen to the voice with our ears, but also observe the expression of the speaker's facial movements with our eyes. People will feel more friendly if they face with not just the speech but also with the talking head.In this paper, attentions are paid much to three-dimensional facial model construction and Chinese visemes. We first applied 3DMax tool to construct facial model. And then use OpenGL to draw 3D human face under VC++ environment. We also use Level of Details to delete some unnecessary lines and faces. Based on the model, we can get our needed three-dimensional face model. Then we give the model texture, so the model can get the facial features of skin, eyes and hair. It looks more actual.In the visual aspects of speech, after study the motion of visual organ of the speaker, a basic lip shape set is set up, which consists of eleven basic lip shapes. Then a lip shape set of rhyme is constructed linearly according to the basic lip shape set. After defining the basic lip-shape set of the mouth, we need to select some parameters to describe them. Considering the versatility and flexibility of the description, we use the FAPs (Facial Animation Parameter) defined by MPEG-4 to describe the basic lip-shape. Considered that the position of the moth can be affected when we pronounce, we choose 24 FAPs to describe the basic lip-shape. After getting the FAPs, we can use them to drive the 3D face model and get the corresponding lip-shape. To verify the performance, in this paper, we realized a TTVS system. The listening test indicates that the output speech is natural. In the visual effects, the transition between the lip-shapes is natural too. The friendliness and convenience of human-computer Interaction is improved.

Keywords/Search Tags:

Text-To-Speech, Visual Speech, Face Model, Viseme, MPEG-4

PDF Full Text Request

Related items

1	The Research And Realization Of Text-To-Visual Speech Synthesis System
2	Visual Speech Synthesis Technology And Its Application Studies In English Pronunciation Tutoring
3	A Study On Speech Driven Human Face Modeling And Animation
4	The Study And Application Of Text-to-Speech System
5	Research On Speech Separation Based On Visual Assistance
6	Research On Problems Of Text-To-Speech System
7	An Approach For Driving Cartoon Animation By Combination Of Speech And Text And Its Implementation On Moving Entertainment
8	Research On Unannotated Long Chinese Speech Text-speech Alignment
9	Research On Crucial Techniques In Chinese Text To Speech System
10	A Study On Speech Synthesis And Visual Speech Synthesis Based On Neural Networks