Font Size: a A A

Research On Acoustic Modeling Methods In Statistical Parametric Speech Synthesis

Posted on:2013-09-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:M LeiFull Text:PDF
GTID:1228330377451758Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
In previous decades, the hidden Markov Model (11MM) based statistical parametric speech synthesis (SPSS) has been proposed and become a popular main-stream text-to-speech method, together with unit selection and waveform concatenation synthesis approach. This method utilizes techniques in automatic speech recognition and some key techniques have been proposed, such as multi-space probability distribution HMM and maximum likelihood parameter generation, for the specific usage of speech synthesis. Compared to unit selection and waveform concatenation synthesis, this method has a lot of advantages, such as high smoothness, robustness, flexibility, small system footprint, fast and automatic system construction, etc.. However, there is still a significant naturalness and quality gap between natural speech and synthesized speech by this method. On the other hand, speech synthesis is a typical interdisciplinary subject and the knowledge of phonetics plays an important role in building of speech synthesis system, where related knowledge includes speech production mechanism, speech perception, prosodic pattern of FOs, properties of articulator movement, properties of formants, and so on. But in HMM-based SPSS, the use of this knowledge is limited:this knowledge is only involved in feature extraction and waveform reproduction steps, where the acoustic model training fully depend on data-driven and machine learning approach. The possible improvement of quality, naturalness and flexibility of synthesized speech by this method is thus constrained.This dissertation considers the knowledge of phonetics and focus on the acoustic modeling in HMM-based SPSS. Based on conventional HMM-based SPSS, two aspects-model training criterion and model structure have been investigated with respect to phonetics knowledge like speech production and speech perception, where the aspect of model structure consists research on FO model and spectral model. By utilizing phonetics knowledge, we integrate speech perception knowledge into model training criterion and investigate the shortage and possibility of improvement of current method; and in order to improve the performance of synthesized speech and flexibility of HMM-based SPSS, we take research on acoustic modeling of FO model and spectral model separately by considering specific knowledge.The whole dissertation is organized as follow:Chapter1is the introduction. It reviews the background and history of speech synthesis research and gives a brief introduction to several speech synthesis techniques.Chapter2introduces the HMM-based SPSS in detail, including fundamental principles, system framework and some related key techniques. Based on analysis of this method, the motivation of our research work is declared.Chapter3considers the phonetics knowledge of speech perception and focuses on the shortage of naturalness of synthesized speech by conventional HMM-based SPSS, the model training criterions are investigated for acoustic model.Chapter4takes research on FO model in HMM-based SPSS. By analysizing the disadvantages of current FO modeling method in HMM-based SPSS, we propose a new FO modeling method that combining prosody production mechanism and utilizing multi-layer to model FO features. The proposed method is compared with other methods. By objective and subjective experiments, the proposed method is proved to have the ability of effectively modeling and predicting FOs, and improves synthesized speech.Chapter5takes research on spectral model in HMM-based SPSS. Considering the clear representation of formant features and close relationship between formants and speech production, we introduce formant features into HMM-based SPSS. The relationship between formant features and conventional spectral features is captured by two-stream model structure. The prediction of spectral features depends on formant features, and the flexibility and control ability of HMM-based SPSS is improved.Chapter6concludes the whole dissertation.
Keywords/Search Tags:speech synthesis, hidden Markov model, parametric synthesis, model training criterion, F0model, formant features
PDF Full Text Request
Related items