Font Size: a A A

Research On Statistical Acoustic Model Based Unit Selection Speech Synthesis Method

Posted on:2015-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y SongFull Text:PDF
GTID:2268330431950126Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Since the end of twentieth century, statistical acoustic model based speech synthesis technology has developed quickly because it has the advantages of automatic system construction and stable synthesis performance. It has gradually become a research focus area of speech synthesis. The hidden Markov model (HMM) is the most common form of statistical acoustic model. HMM-based speech synthesis includes two categories, which are HMM-based parametric speech synthesis and HMM-based unit selection speech synthesis. In the unit selection speech synthesis method, suitable unit sequence is selected according to the text of synthetic sentence from a prerecorded corpus. Then the waveforms of unit sequences are concatenated to form the synthetic speech. In the condition of having a corpus with sufficient data, unit selection synthesis method has the advantage of higher synthetic speech quality and naturalness comparing with the parametric speech synthesis.HMM-based unit selection speech synthesis method is studied in the paper. Two existing problems in the unit selection criterion are improved. First, in HMM-based unit selection speech synthesis method, acoustic features are trained for different statistical acoustic model in the training phase. The likelihoods derived from different models are combined to design the unit selection criterion in the unit selection phase. In the existing method, the parameters of acoustic statistical models can be estimated based on the maximum likelihood criterion using a training corpus, but the weights used to combine different statistical acoustic models can’t be obtained through the automatic training. Considering that these model weights exert significant impact on the naturalness of synthetic speech, an approach based on synthetic speech quality for optimizing model weights is proposed in the paper. Second, the single Gaussian distribution at each HMM state was set to be diagonal in the traditional spectral feature modeling. The capability of description for high dimensional spectral features is insufficient. Therefore the features with weak inter-dimension correlation such as Mel-spectra are adopted in the traditional method. Mel-cepstra lose many details of spectra comparing with high-dimensional spectral features such as spectral envelopes. Therefore, an approach of restricted Boltzmann machine based spectral modeling and unit selection method is proposed in the paper. The criterion of unit selection is revised and the naturalness of synthetic speech is improved.The structure of the paper is organized as follows:The first chapter is the introduction. The fundamental knowledge of speech synthesis is introduced. Then it will review the history of speech synthesis, and introduce the common methods and research focus areas of speech synthesis technology at current stage.In the second chapter, HMM-based unit selection speech synthesis is introduced, including the basic principle of HMM, the framework of system construction, key technologies, the advantages and disadvantages of this method. Then the motivations of following research work are raised.In the third chapter, an approach based on synthetic speech quality assessment for optimizing model weights is introduced. First, a small amount of manual listening test results are collected according to the default combination of weights. Then the method of multivariate adaptive regression splines is used to construct a prediction model of synthetic speech quality for using different model weights. The optimal weights are automatically searched by pattern search algorithm based on this prediction model. The experiment results show that the method can effectively optimize the model weights and improve the naturalness of synthetic speech.In the fourth chapter, an approach of the restricted Boltzmann machine (RBM) based spectrum modeling and unit selection method is proposed. In the training phase, RBM is adopted for spectrum modeling for each HMM state. In the synthesis phase, RBM model is used to calculate the log likelihood of unit spectrum features, and the piecewise linear mapping function is used to construct the target cost of unit selection. The experiment results show that this method can effectively improve the naturalness of synthetic speech. In addition, different methods of applying the modified target cost function to unit selection are analyzed, and am RBM-based modeling method for concatenative spectral features is also investigated.The fifth chapter summarizes the whole paper.
Keywords/Search Tags:speech synthesis, unit selection, hidden Markov model, restrictedBoltzmann machine
PDF Full Text Request
Related items