Font Size: a A A

Research And Implementation On Chinese Text-To-Visual Speech Synthesis System (TTVS)

Posted on:2011-10-27Degree:MasterType:Thesis
Country:ChinaCandidate:C Z RongFull Text:PDF
GTID:2178360305451810Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the development of computer technology and other related subjects, the speech synthesis technique progressed a lot and there emerge a lot of new theories and technologies. People have put forward higher requirements for speech synthesis. As we know, the understanding of human speech is multi-modal. In many occasions, we do not just listen to the voice with our ears, but also observe the expression of the speaker's facial movements with our eyes. People will feel more friendly if they face with not just the speech but also with the talking head.In this paper, attentions are paid much to three-dimensional facial model construction and Chinese visemes. We first applied 3DMax tool to construct facial model. And then use OpenGL to draw 3D human face under VC++ environment. We also use Level of Details to delete some unnecessary lines and faces. Based on the model, we can get our needed three-dimensional face model. Then we give the model texture, so the model can get the facial features of skin, eyes and hair. It looks more actual.In the visual aspects of speech, after study the motion of visual organ of the speaker, a basic lip shape set is set up, which consists of eleven basic lip shapes. Then a lip shape set of rhyme is constructed linearly according to the basic lip shape set. After defining the basic lip-shape set of the mouth, we need to select some parameters to describe them. Considering the versatility and flexibility of the description, we use the FAPs (Facial Animation Parameter) defined by MPEG-4 to describe the basic lip-shape. Considered that the position of the moth can be affected when we pronounce, we choose 24 FAPs to describe the basic lip-shape. After getting the FAPs, we can use them to drive the 3D face model and get the corresponding lip-shape. To verify the performance, in this paper, we realized a TTVS system. The listening test indicates that the output speech is natural. In the visual effects, the transition between the lip-shapes is natural too. The friendliness and convenience of human-computer Interaction is improved.
Keywords/Search Tags:Text-To-Speech, Visual Speech, Face Model, Viseme, MPEG-4
PDF Full Text Request
Related items