Font Size: a A A

Research On Deep Neural Network Based Chinese Speech Synthesis

Posted on:2015-10-31Degree:MasterType:Thesis
Country:ChinaCandidate:Z ZhangFull Text:PDF
GTID:2298330422993489Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the rapid growth of speech synthesis technology, it has became more and moreimportant in people’s lives.Hidden Markov Model based statistical parameters speechsynthesis system(HTS)has become the most popular system because of its easy access tothe parameter adjustment in order to generate different styles of sound. However, HTS stillfacing some significan drawbacks such as the over-smooth of the spectrum and lack ofdetails of the wav.In this paper, we focus on the point of parameters conversion, using DeepNeural Networks(DNN) to train a series of conversion networks to transform the spectrumparameter of the synthesis wav in order to improve HTS synthesis performance with a limitamount of data.1. factors like the depth and the structure of DNN will led to different learningresult.In this paper we compare therecognition result ofsilence/unvoiced/voiced using DNN with different parameters and structures,and explore the influence caused by these factors, and show the effectivenessof the DNN system in s/u/v classification.2. In this paper, we believe that the decrease of performance of the HTS ismainly due to the loss of spectrum details during statistical training. So wepick out the parallel corpus from the original corpus and the synthesis corpus,then using DNN to train a series of conversion networks to transform thespectrum parameters of the synthesis speech. Experiment result shows that thissystem could improve the performance of HTS effectively.3. To enhance the quality further, we then use the Temporal Decomposition(TD)algorithm to get the event functions and event vectors of the speech. Researchhave shown that event functions affect the intelligibility of speech while eventvectors affect the voice naturalness. We try to transform the event vectors ofthe synthesis speech using the same method above in order to only enhancethe naturalness part. Experiment result show that this method can effectivelyimprove the sound quality of synthesis speech.
Keywords/Search Tags:HTS, DNN, deep leaning, temporal decomposition, voice conversion
PDF Full Text Request
Related items