Research On Affective Speech Synthesis

Posted on:2014-04-02

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Lu

Full Text:PDF

GTID:2268330422460119

Subject:Circuits and Systems

Abstract/Summary:

PDF Full Text Request

With the wide using of human-computer interactive system in recent years,speech synthesis technology has been concerned by many people. Although speechsynthesis has achieved good effects in terms of clarity, intelligibility and naturalness,human-computer interactive system is mainly neutral voice-based and lacking inemotional expression at present. Human’s voice communication, however, not onlyincludes basic verbal content, but carries a large number of abundant emotionalinformation. Therefore, emotional speech synthesis becomes the international hotresearch. This article introduces a three-dimensional emotional model of PAD(Pleasure-Arousal-Dominance), establishing an emotional corpus with11kinds ofemotional, and marks PAD value of emotional speech. And on this basis, using fivedegree tone model creates a baseband model of the emotional speech, and GRNN(Generalized Regression Neural Network) achieves a rhythm of emotional speechconversion. The thesis, further, uses speaker adaptive training (SAT) methods toachieve the statistical parameters speech synthesis of emotional speech. The mainwork and innovation of the thesis is as follows:Firstly,we establish an emotional speech corpus. The corpus records a femalespeaker’s11kinds of typical emotions, including neutral, relax, surprise, tender, joy,anger, anxiety, disgust, contempt, fear, sadness, and brings in the three-dimensionalemotional model PAD, marked emotional PAD values of speech corpus, markedrhythmic structure of the text corpus.Secondly,we propose an emotional speech rhythm conversion method based onthe PAD three-dimensional emotion model. Using five degree tone model establishesemotional speech baseband envelope model, and using GRNN achieves emotionalspeech rhythm conversion. Experimental results show that the maximum RMSE errorof emotional speech baseband envelope by five degree tone model is less than6.9Hz,and meet the requirements of the baseband curve modeling. Under the95%confidence interval, the average EMOS score of emotional speech, which is gainedthrough the transformation of GRNN model, is3.6points. The score shows that it canexpress the emotional information.Finally,we propose a method of emotional speech statistical parameters synthesisbased speaker adaptive training (Speaker Adaptive Training, SAT). The thesis designscontext-sensitive text annotation format, and creates an emotional speech problem set.By mixing multiple speak Mandarin corpus and a speaker’s emotional speech corpus,a average sound model can get from speaker adaptive training. And then through thespeaker adaptive conversion, using a lot of the speaker’s emotional training speech,the emotional speech model of speaker dependent (SD) can be obtained from theaverage sound model. Thereby the emotional speech can be synthesized.Experimental results show that with the proposed method, the average EMOS score of synthesizing emotional speech is2.7. This is superior to that of model which onlyusing the emotional speech training EMOS score.

Keywords/Search Tags:

Affective Speech, PAD, Five Degree Tone Model, ProsodyConversion, Hidden Markov Models, Speaker Adaptive Training

PDF Full Text Request

Related items

1	Speaker Recognition Based On Continuous Hidden Markov Model
2	Research On Method Of Unit Selection Speech Synthesis Based On Hidden Markov Model
3	Research On Chinese Continuous Speech Tone And Digit String Recognition System
4	Speech Recognition Method Based On Hidden Markov Models
5	Research On Statistical Parametric Speech Synthesis Integrating Speech Production Mechanisms
6	Research On Speaker-Independent Speech Recognition System Based On HMM
7	Research And Application Of Chinese Text-to-speech Based On Recurrent Neural Network
8	Research On Modern Speech Recognition Technology And Application In The Telecommunications Customized Ringing Tone Service
9	The Design Of Customized Ringing Tone Customization Platform Based On Speech Recognition Technology
10	Speaker Recognition In Noisy Environment