Font Size: a A A

The Method And Implementation Of ToBI Automatic Prosodic Labeling In English Text To Speech System

Posted on:2017-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y M WangFull Text:PDF
GTID:2308330488465243Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the surging wave of Internet trend of blowing, speech synthesis technology is also taking advantage of the opportunity to stand in the air to achieve a rapid growth. As part of the artificial intelligence, the future direction of speech synthesis technology,is making machine to achieve the human voice synthesizer to speaking level. So as a key speech synthesis technology that expressing effect of rhythmic expression will obtain more and more attention. The paper develops discussion and research as for ToBI rhythm autolabels, and shows the effect of loading the English TTS after automatic annotation. Specific works are as follows:Firstly, the paper elaborates the background and historical development,and introduces a variety of speech synthesis methods of speech synthesis technology, including two synthetic approaches of the mainstream which are parameters synthesis based on HMM model and stitching synthesis based on large corpus. In view of the importance of ToBI system, chapter II the paper introduces it in details.Secondly, in the next chapters, paper will focus on the description of C4.5 prediction tree algorithm, maximum entropy algorithm and conditional random algorithm.And in the specific implementation process, paper introduces several training models and testing methods. Through analysis and comparison of different models and rhythms, we can use different models of prosody for automatic labeling, loading into the English TTS.Finally, the paper will attain a direct data result,through predicting a few different models.The resule shows C4.5 decision tree algorithm and CRF model can be effectively used to predicting and labeling ToBI system. When prosody prediction model is added,paper made a subjective MOS audiometry test as for the synthesized voice of English TTS.Compared with the previous MOS score, new sentence upgrade 0.31 which shows an improvement on the rhythm clearly. This further demonstrates experimental ideas and methods in paper are reliable.In addition, the paper summarizes the experimental results, and presents several optimization parts in ToBI autolabels, and some vision and recommendations in ToBI prosody prediction.
Keywords/Search Tags:Speech synthesis, ToBI prosodic annotation, C4.5 decision tree CRF model, Prosody prediction
PDF Full Text Request
Related items