Research On Statistical Parametric Emotional Speech Synthesis

Posted on:2017-08-28

Degree:Master

Type:Thesis

Country:China

Candidate:D L Hao

Full Text:PDF

GTID:2348330488470251

Subject:Electronic Science and Technology

Abstract/Summary:

The quality of synthesized speech makes a remarkable improvement with the progress of speech synthesis technology. However, current researches of speech synthesis technology mainly focused on neutral speech synthesis. There is the lack of studies on emotional speech synthesis. The needs for intelligent voice in the human life not only cover basic textual information, but also carry a large number of emotional information. Therefore, the study on emotional speech synthesis will be the inevitable trend in the intelligent voice research. The thesis establishes an emotional speech corpus including a variety of emotions recorded by multi-speaker. Then a six-level context-dependent label format is designed for generating context-dependent labels of Mandarin statistical parametric speech synthesis. Speaker adaptive training algorithm is employed to train the emotional acoustic model with multispeaker’s emotional speech corpus to achieve statistical parametric speech synthesis. The main works and originalities of the thesis are as follows:Firstly, the thesis establishes a multi-speaker speech corpus with 11 kinds of emotions. The thesis induces emotional state of speakers to record emotional speech corpus in a professional studio. The emotional speech corpus includes 11 kinds of typical emotions narrated by 7 male speakers and 7 female speakers where speech signal is saved in the Microsoft WAV format(single-channel, 16 bit, 16 k Hz sampling frequency).Secondly, the thesis realizes a label generation algorithm for Mandarin statistical parametric speech synthesis. A six-level context-dependent label format is designed aiming at the generation of context-dependent labels for statistical parametric speech synthesis, which uses the initial and the final of Mandarin as the synthesis unit. A Hidden Markov Model(HMM) based statistical parametric speech synthesis system is adopted to synthesize the Mandarin speech. We evaluate the influences of the different label information on quality of synthesized speech by subjective evaluation and objective evaluation. Tests show that the designed six-level context-dependent label format meet the need of Mandarin emotional speech synthesis.Finally, the thesis proposes a statistical parametric emotional speech synthesis method by using a HMM-based statistical parametric speech synthesis with multi-speaker’s multi emotional training speech corpus. A set of average emotional acoustic model is trained by applying multi-speaker’s emotional speech corpus to the speaker adaptive training(SAT) algorithm. The target speaker’s emotional speech corpus is then used to perform the speaker adaptation transformation to obtain a target speaker’s acoustic model with target emotion for synthesizing target speaker’s emotional speech. Tests show that the synthesized emotional speech has good naturalness and emotional similarity.

Keywords/Search Tags:

emotional speech synthesis, emotional corpus, context-dependent information, tagging format, speaker adaptive training, statistical parametric speech synthesis

Related items

1	Research On Statistical Parametric Mandarin-Tibetan Cross-lingual Speech Synthesis
2	Research On Emotional Speech Synthesis Based On Deep Neural Network
3	Research On Statistical Parametric Speech Synthesis Based On Speaker Adaptive Training
4	Create An Emotional Speech Synthesis Corpus
5	Research And Implementation Of Speech Synthesis Method For Helping Old Robots
6	Research And Application Of Speech Synthesis Method Integrating Emotional Expressiveness
7	Esearch On The Modeling And Generation Of Fundamental Frequencies In Statistical Parametric Speech Synthesis
8	Research And Implementation Of Emotional Speech Synthesis System
9	Research Of Chinese Emotional Speech Synthesis Based On HMM
10	Researcb Of Emotional Speech Recognition And Synthesis