Research On Statistical Parametric Speech Synthesis Based On Speaker Adaptive Training

Posted on:2014-02-07

Degree:Master

Type:Thesis

Country:China

Candidate:W L Song

Full Text:PDF

GTID:2268330422459603

Subject:Electronics and Communications Engineering

Abstract/Summary:

The performance of synthetic speech is improving in recent year. To fully transmitinformation contained in speech signals, the Text-To-Speech (TTS) system need to be ableto generate a more natural speech with arbitrary speakerâ€™s voice characteristics. Althoughlarge corpus based waveform concatenating speech synthesis system can synthesize speechwith high intelligibility and naturalness, it is a time-consuming and laborious to record andprepare a large speech database, and it is difficult to implement synthetic speech withdifferent speakers and various emotions. The HMM (Hidden Markov Model) basedstatistical parametric speech synthesis system can generate various speakerâ€™s voice, but thenaturalness of the synthetic speech is not very good. Therefore, a speaker adaptation basedspeech synthesis method, which can synthesize speech of the target speakerâ€™s voicecharacteristics by using a small amount of training speech data uttered from the targetspeaker with speaker adaptation, becomes an important hotspot at present. This thesisresearches the HMM based statistical parametric speech synthesis system. A Mandarinspeech synthesis system was realized by using speaker adaptive training and speakeradaptation transform. The system can generate synthesis speech with the characteristics ofthe different target speaker. Experimental results showed that the performance of proposedsystem is better than the system that synthesizes the voice from the speaker dependent (SD)model trained from only one speakerâ€™s training data. Main work and achievements are asfollows.Firstly, the initial and the final of Mandarin were classified from the basic unit ofMandarin pronunciation according to the "Modern Chinese Dictionary". Thecontext-dependent information of the initial and the final was extracted from the sentences.An annotation scheme was designed for labeling the context information based on theanalysis of context-dependent information of sentence. The context information is dividedinto six layers. The label format and label content of each layer are determined respectivelyaccording to the context-dependent information. An HTS-oriented labeling program wasdeveloped to generate full context-dependent label.Secondly, a question set generation program was developed for HTS system accordingto the standard Mandarin pronunciation characteristics and prosodic features. It can generatequestions related to the segmental features and rhythm features, which can be used in theHTS system for decision tree clustering of the model states.Thirdly, we implement a multi-speaker speech synthesis system based on HMM byusing the speaker adaptive training method. The system can achieve an average voice modelby using several speakersâ€™ training sentences. It can acquire speaker dependent model ofdifferent speaker by using the speaker adaptive transformation. Experimental results show that the MOS scores and the DMOS score of the method proposed in this paper are betterthan the results of speaker dependent model by using only one speakerâ€™s training data.

Keywords/Search Tags:

HMM, Multi-speaker, Adaptation, Speech synthesis

Related items

1	Research And Implementation On Speaker Speech Adaptive Technique
2	Research And Implementation Of Multi-Speaker Speech Synthesis System For Audio Novels
3	Research On Speaker Adaptation In Speech Recognition
4	Research On Personalized Speech Synthesis Based On Deep Speech Representations
5	Speaker Adaptation Of DNN-HMM Acoustic Model For Speech Recognition
6	Research On Speaker Adaptation Methods Based On RNN-BLSTM Acoustic Model
7	Research And Implementation Of Speech Synthesis Based On Fastpeech
8	Research On Statistical Parametric Speech Synthesis Integrating Speech Production Mechanisms
9	Research And Implementation Of Speech Synthesis Method For Helping Old Robots
10	Research On Speaker Adaptation Of Neural Network Acoustic Models For Speech Recognition