Font Size: a A A

Research On Statistical Parametric Speech Synthesis Based On Speaker Adaptive Training

Posted on:2014-02-07Degree:MasterType:Thesis
Country:ChinaCandidate:W L SongFull Text:PDF
GTID:2268330422459603Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
The performance of synthetic speech is improving in recent year. To fully transmitinformation contained in speech signals, the Text-To-Speech (TTS) system need to be ableto generate a more natural speech with arbitrary speaker’s voice characteristics. Althoughlarge corpus based waveform concatenating speech synthesis system can synthesize speechwith high intelligibility and naturalness, it is a time-consuming and laborious to record andprepare a large speech database, and it is difficult to implement synthetic speech withdifferent speakers and various emotions. The HMM (Hidden Markov Model) basedstatistical parametric speech synthesis system can generate various speaker’s voice, but thenaturalness of the synthetic speech is not very good. Therefore, a speaker adaptation basedspeech synthesis method, which can synthesize speech of the target speaker’s voicecharacteristics by using a small amount of training speech data uttered from the targetspeaker with speaker adaptation, becomes an important hotspot at present. This thesisresearches the HMM based statistical parametric speech synthesis system. A Mandarinspeech synthesis system was realized by using speaker adaptive training and speakeradaptation transform. The system can generate synthesis speech with the characteristics ofthe different target speaker. Experimental results showed that the performance of proposedsystem is better than the system that synthesizes the voice from the speaker dependent (SD)model trained from only one speaker’s training data. Main work and achievements are asfollows.Firstly, the initial and the final of Mandarin were classified from the basic unit ofMandarin pronunciation according to the "Modern Chinese Dictionary". Thecontext-dependent information of the initial and the final was extracted from the sentences.An annotation scheme was designed for labeling the context information based on theanalysis of context-dependent information of sentence. The context information is dividedinto six layers. The label format and label content of each layer are determined respectivelyaccording to the context-dependent information. An HTS-oriented labeling program wasdeveloped to generate full context-dependent label.Secondly, a question set generation program was developed for HTS system accordingto the standard Mandarin pronunciation characteristics and prosodic features. It can generatequestions related to the segmental features and rhythm features, which can be used in theHTS system for decision tree clustering of the model states.Thirdly, we implement a multi-speaker speech synthesis system based on HMM byusing the speaker adaptive training method. The system can achieve an average voice modelby using several speakers’ training sentences. It can acquire speaker dependent model ofdifferent speaker by using the speaker adaptive transformation. Experimental results show that the MOS scores and the DMOS score of the method proposed in this paper are betterthan the results of speaker dependent model by using only one speaker’s training data.
Keywords/Search Tags:HMM, Multi-speaker, Adaptation, Speech synthesis
PDF Full Text Request
Related items