Font Size: a A A

Research And Application Of Chinese Text-to-speech Based On Recurrent Neural Network

Posted on:2020-11-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y T YingFull Text:PDF
GTID:2428330626450739Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As one of the key components of man-machine interaction,text-to-speech has played an important role in the field of life such as audio guide,electronic seeing-eye and intelligent voice robot.At present,waveform coding TTS and statistical parametric TTS are widely used.The former adopts a large number of natural language as a synthesis unit,and the synthesized speech has high naturalness but the speech library has a long production cycle,high labor cost and poor generalization.The latter is usually based on hidden Markov model,and the synthesized speech quality is slightly lower overall.With the development of deep learning,improving the naturalness of synthesized speech,enriching the expressive power of synthesized speech and reducing the complexity of text-to-speech are important research contents in the field of intelligent speech.Now,text-to-speech based on deep learning has become the mainstream and the overall research direction is end-to-end speech synthesis.Therefore,the complete end-to-end Chinese text-to-speech based on RNN is of great significance and practical value.The main work of the thesis is as follows:1.From dataset creation to end-to-end Chinese text-to-speech model training,this thesis design and implement a complete process.The Tacotron model is extended to Chinese text-to-speech.Among them,a Chinese speech automatic segmentation method with long duration to short duration is also designed and is applied to the creation of dataset.2.Research and implementation of the speaker dependent text-to-speech method based on Tacotron model,and mainly study the effects of three different Pinyin forms as text sequences to train the model.3.For the case where the speech tail of the model synthesis has noise or repeated pronunciation,a post-processing method is proposed to deal with the tail noise,and good results are obtained.4.Research and implementation of the speaker independent and speaker adaptive training text-to-speech method based on Tacotron model,and mainly study the effects of three different Pinyin forms as text sequences to train the model.The complete end-to-end Chinese text-to-speech method based RNN researched and implemented greatly simplifies the cumbersome dataset creation process and artificial feature engineering.The naturalness of synthesized speech is better than the widely used statistical parameter speech synthesis technology.
Keywords/Search Tags:text-to-speech, end-to-end, Tacotron model, speaker dependent, speaker independent and speaker adaptive training
PDF Full Text Request
Related items