Research And Application Of Chinese Text-to-speech Based On Recurrent Neural Network

Posted on:2020-11-20

Degree:Master

Type:Thesis

Country:China

Candidate:Y T Ying

Full Text:PDF

GTID:2428330626450739

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

As one of the key components of man-machine interaction,text-to-speech has played an important role in the field of life such as audio guide,electronic seeing-eye and intelligent voice robot.At present,waveform coding TTS and statistical parametric TTS are widely used.The former adopts a large number of natural language as a synthesis unit,and the synthesized speech has high naturalness but the speech library has a long production cycle,high labor cost and poor generalization.The latter is usually based on hidden Markov model,and the synthesized speech quality is slightly lower overall.With the development of deep learning,improving the naturalness of synthesized speech,enriching the expressive power of synthesized speech and reducing the complexity of text-to-speech are important research contents in the field of intelligent speech.Now,text-to-speech based on deep learning has become the mainstream and the overall research direction is end-to-end speech synthesis.Therefore,the complete end-to-end Chinese text-to-speech based on RNN is of great significance and practical value.The main work of the thesis is as follows:1.From dataset creation to end-to-end Chinese text-to-speech model training,this thesis design and implement a complete process.The Tacotron model is extended to Chinese text-to-speech.Among them,a Chinese speech automatic segmentation method with long duration to short duration is also designed and is applied to the creation of dataset.2.Research and implementation of the speaker dependent text-to-speech method based on Tacotron model,and mainly study the effects of three different Pinyin forms as text sequences to train the model.3.For the case where the speech tail of the model synthesis has noise or repeated pronunciation,a post-processing method is proposed to deal with the tail noise,and good results are obtained.4.Research and implementation of the speaker independent and speaker adaptive training text-to-speech method based on Tacotron model,and mainly study the effects of three different Pinyin forms as text sequences to train the model.The complete end-to-end Chinese text-to-speech method based RNN researched and implemented greatly simplifies the cumbersome dataset creation process and artificial feature engineering.The naturalness of synthesized speech is better than the widely used statistical parameter speech synthesis technology.

Keywords/Search Tags:

text-to-speech, end-to-end, Tacotron model, speaker dependent, speaker independent and speaker adaptive training

PDF Full Text Request

Related items

1	Text-independent Speaker Recognition Method And System Based On Spatial Distribution Of Speech Features
2	Research On Adaptive Methods For Text-independent Speaker Recognition
3	Text-Dependent Speaker Verification System
4	Design And Implementation On Text-Dependent Speaker Recognition System
5	The Study Of Hierarchical Speaker Segmentation And Relative Algorithms
6	Research On Text-Independent Speaker Verification System
7	Speaker Adaptation Of DNN-HMM Acoustic Model For Speech Recognition
8	Any Text Speaker Recognition System
9	Text-independent Speaker Recognition
10	Research On Speaker Adaptation In Speech Recognition