Font Size: a A A

Research On Statistical Parametric Emotional Speech Synthesis

Posted on:2017-08-28Degree:MasterType:Thesis
Country:ChinaCandidate:D L HaoFull Text:PDF
GTID:2348330488470251Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
The quality of synthesized speech makes a remarkable improvement with the progress of speech synthesis technology. However, current researches of speech synthesis technology mainly focused on neutral speech synthesis. There is the lack of studies on emotional speech synthesis. The needs for intelligent voice in the human life not only cover basic textual information, but also carry a large number of emotional information. Therefore, the study on emotional speech synthesis will be the inevitable trend in the intelligent voice research. The thesis establishes an emotional speech corpus including a variety of emotions recorded by multi-speaker. Then a six-level context-dependent label format is designed for generating context-dependent labels of Mandarin statistical parametric speech synthesis. Speaker adaptive training algorithm is employed to train the emotional acoustic model with multispeaker's emotional speech corpus to achieve statistical parametric speech synthesis. The main works and originalities of the thesis are as follows:Firstly, the thesis establishes a multi-speaker speech corpus with 11 kinds of emotions. The thesis induces emotional state of speakers to record emotional speech corpus in a professional studio. The emotional speech corpus includes 11 kinds of typical emotions narrated by 7 male speakers and 7 female speakers where speech signal is saved in the Microsoft WAV format(single-channel, 16 bit, 16 k Hz sampling frequency).Secondly, the thesis realizes a label generation algorithm for Mandarin statistical parametric speech synthesis. A six-level context-dependent label format is designed aiming at the generation of context-dependent labels for statistical parametric speech synthesis, which uses the initial and the final of Mandarin as the synthesis unit. A Hidden Markov Model(HMM) based statistical parametric speech synthesis system is adopted to synthesize the Mandarin speech. We evaluate the influences of the different label information on quality of synthesized speech by subjective evaluation and objective evaluation. Tests show that the designed six-level context-dependent label format meet the need of Mandarin emotional speech synthesis.Finally, the thesis proposes a statistical parametric emotional speech synthesis method by using a HMM-based statistical parametric speech synthesis with multi-speaker's multi emotional training speech corpus. A set of average emotional acoustic model is trained by applying multi-speaker's emotional speech corpus to the speaker adaptive training(SAT) algorithm. The target speaker's emotional speech corpus is then used to perform the speaker adaptation transformation to obtain a target speaker's acoustic model with target emotion for synthesizing target speaker's emotional speech. Tests show that the synthesized emotional speech has good naturalness and emotional similarity.
Keywords/Search Tags:emotional speech synthesis, emotional corpus, context-dependent information, tagging format, speaker adaptive training, statistical parametric speech synthesis
PDF Full Text Request
Related items