Font Size: a A A

Research On Statistical Acoustic Model Based Speech Synthesis

Posted on:2009-01-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z H LingFull Text:PDF
GTID:1118360242495814Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the development of statistical modeling techniques for speech signals and the performance improvement of parametric speech synthesizer, statistical parametric speech synthesis methods have been proposed and made significant progress in the last decade. One representative approach of these methods is Hidden Markov Model (HMM) based parametric synthesis, which has become a mainstream speech synthesis approach together with the unit selection and waveform concatenation approach. This method has a lot of advantages compared with the conventional unit selection speech synthesis, such as high smoothness, robustness and flexibility, fast and automatic system construction, small system footprint, and so on.This dissertation focuses on the application of statistical acoustic model to speech synthesis. Besides the original HMM-based parametric synthesis approach, two novel methods are proposed. The first is HMM-based unit selection and waveform concatenation synthesis. We apply the statistical ideas in HMM-based parametric synthesis to unit selection and waveform concatenation system to overcome the shortcoming of speech quality for parametric synthesis system and improve the naturalness of synthesized speech. The second method is parametric synthesis for integrated acoustic and articulatory features. Considering that articulatory features give better representation of speech generation mechanism, we integrate articulatory features into HMM-based parametric synthesis system to improve the accuracy and flexibility of acoustic parameter generation by simultaneous modeling and generation of acoustic and articulatory features.The whole dissertation is organized as follow:Chapter 1 is the introduction. It reviews the history of speech synthesis research and gives a brief introduction to the several most common speech synthesis techniques.Chapter 2 introduces the HMM-based parametric synthesis method in detail, including the fundamental principles of HMM, the system framework, and some key techniques in the system. Based on some analysis of the characteristics of this method, the motivation of our research work is declared.Chapter 3 focuses on the HMM-based unit selection synthesis method. At first, two different HMM-based unit selection systems are introduced. The first system adopts frame-sized unit and maximum likelihood criterion for unit selection; the second system uses hierarchical units and combines Kullback-Leibler divergence together with likelihood criterion to select the optimal unit sequence. Then, a unified framework of HMM-based unit selection speech synthesis method is proposed. Our evaluations on Chinese and English systems prove the effectiveness of the proposed method. At last, Minimum Unit Selection Error (MUSE) criterion for the model training of HMM-based unit selection system is proposed to achieve fully automatic system construction and improve the naturalness of synthesized speech.Chapter 4 presents a method that integrating articulatory features into the original HMM-based parametric synthesis system where only acoustic features are used. Here, we use "articulatory features" to refer to the quantitative positions and continuous movements of a group of articulators. These articulators include the tongue, jaw, lips, velum, and so on. After a brief introduction to the original system, the modeling and parameter generation methods for unified acoustic and articulatory features are proposed. Different model structures are explored to allow the articulatory features to influence acoustic modeling: model clustering, state synchrony and cross-stream feature dependency. The results of objective and subjective evaluation show that the accuracy and flexibility of acoustic parameter prediction can be improved effectively by proposed method.Chapter 5 concludes the whole dissertation.
Keywords/Search Tags:speech synthesis, hidden Markov model, parametric synthesis, unit selection, articulatory features
PDF Full Text Request
Related items