Research On Bimodal Emotional Chinese Speech Synthesis

Posted on:2012-12-12

Degree:Master

Type:Thesis

Country:China

Candidate:J Yuan

Full Text:PDF

GTID:2218330338461767

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

The goal of next generation speech synthesis system is to deliver semantic information exactly and vividly with clear and natural synthetic speech. Main task of Bimodal Emotional Speech Synthesis is to let computer have the potential of synthesizing natural emotional speech and realistic facial expressions by establishing a virtual computer avatar. Bimodal speech synthesis and speech recognition are core technologies to realize human-machine interaction. It has important application value in information processing area.The thesis concentrates on 3D model construction, rendering, animation driven approach, emotional prosodic feature modeling and speech synthesis based on Pitch Synchronous Overlap-Add algorithm.In the aspect of facial modeling, VRML model parsing and rendering based on OpenGL have been done. The face model consists of 7 components which contain 6435 vertexes and 12280 faces in total. The model used in this paper is more complex than other related research, and achieves better life-like face details.Compare two animation driving methods, parameter control method and data driven method. Resolve motion problems of tooth, tongue and throat by improving data collection approach. In FAP parameter control method based on MPEG-4, Radial Basis Function and Raised Cosine Function are chosen to control mouth and expression respectively. Data driving method based on key frame interpolation use a cubic polynomial to interpolate key frames, and then compose viseme and expression frames by vector weighting superposition to generate successive animation. Results show that FAP parameter method can achieve slight change of expressions and lip shape. Data driving approach can produce new expressions through fusion key frames.To improve the naturalness of synthetic speech, we modified the wave concatenation algorithm. Prosody prediction unit and modification unit based on PSOLA are added. In synthesis stage, units were selected based on combination of decision tree and cost function. Simulation results show that the synthetic speech gives exact emotion and natural voice.This paper realizes a bimodal mandarin emotional TTS system, meets the requirements of real-time animation with large data. Synthesized speech can express emotional information in visual and audio aspects accurately and vividly.

Keywords/Search Tags:

Bimodal Speech, Facial Animation, Text-to-Emotional Speech

PDF Full Text Request

Related items

1	Chinese Speech Synchronized3D Facial Animation
2	Realistic 3d Facial Expression Animation Design And Realization
3	Research On Bimodal Emotion Recognition Based On Facial Expression And Speech Signal
4	Speech Emotional Recognition Research Fuses Facial Expression
5	Research On Text-driven Visual Speech Synthesis Technology
6	Bimodal Emotion Recognition Based On Facial Expression And Speech
7	Research On 3D Visible Speech Animation Driven By Prosody Text
8	Speech Driven Facial Animation Synthesis Based On Deep Learning Network Model
9	Research On Data-driven 3D Facial Animation
10	Research And Implementation Of Gaussian Mixture Model-based Speech Emotion Recognition