Research On Deep Neural Network Based Chinese Speech Synthesis

Posted on:2015-10-31

Degree:Master

Type:Thesis

Country:China

Candidate:Z Zhang

Full Text:PDF

GTID:2298330422993489

Subject:Electronics and Communications Engineering

Abstract/Summary:

PDF Full Text Request

With the rapid growth of speech synthesis technology, it has became more and moreimportant in peopleâ€™s lives.Hidden Markov Model based statistical parameters speechsynthesis system(HTS)has become the most popular system because of its easy access tothe parameter adjustment in order to generate different styles of sound. However, HTS stillfacing some significan drawbacks such as the over-smooth of the spectrum and lack ofdetails of the wav.In this paper, we focus on the point of parameters conversion, using DeepNeural Networks(DNN) to train a series of conversion networks to transform the spectrumparameter of the synthesis wav in order to improve HTS synthesis performance with a limitamount of data.1. factors like the depth and the structure of DNN will led to different learningresult.In this paper we compare therecognition result ofsilence/unvoiced/voiced using DNN with different parameters and structures,and explore the influence caused by these factors, and show the effectivenessof the DNN system in s/u/v classification.2. In this paper, we believe that the decrease of performance of the HTS ismainly due to the loss of spectrum details during statistical training. So wepick out the parallel corpus from the original corpus and the synthesis corpus,then using DNN to train a series of conversion networks to transform thespectrum parameters of the synthesis speech. Experiment result shows that thissystem could improve the performance of HTS effectively.3. To enhance the quality further, we then use the Temporal Decomposition(TD)algorithm to get the event functions and event vectors of the speech. Researchhave shown that event functions affect the intelligibility of speech while eventvectors affect the voice naturalness. We try to transform the event vectors ofthe synthesis speech using the same method above in order to only enhancethe naturalness part. Experiment result show that this method can effectivelyimprove the sound quality of synthesis speech.

Keywords/Search Tags:

HTS, DNN, deep leaning, temporal decomposition, voice conversion

PDF Full Text Request

Related items

1	Voice Conversion Based On AHOcoder And GMM Model
2	An Algorithm For Voice Conversion With Noise Robustness
3	A Study On Deep Learning-Based Voice Conversion For Identity Disguise In Voice Communication
4	Neural Network Based Voice Conversion
5	Research And Implementation Of Voice Conversion System Based On Deep Learning
6	The Study On Feature Descriptor Leaning Based On Deep Leaning
7	Study On The Neural Network Modelling Method For Voice Conversion
8	Voice Conversion Using Deep Belief Network In Super-frame Feature Space
9	Voice Conversion Using STRAIGHT Model And Deep Belief Network
10	Age-Voice Conversion System Driven By Multi-Parameter