Font Size: a A A

Research On Statistical Parametric Speech Synthesis Integrating Speech Production Mechanisms

Posted on:2016-08-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:M Q CaiFull Text:PDF
GTID:1228330467995015Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the development in recent decades, statistical parametric speech synthesis method have become a mainstream speech synthesis approach together with the unit selection and waveform concatenation approach. The hidden Markov model (HMM) is a representative acoustic modeling approach for statistical parametric speech synthe-sis methods. And it has a lot of advantages compared with the conventional unit se-lection speech synthesis, for example, automatic system construction, fast adaptation, high smoothness, small system footprint and so on. But the quality and naturalness of synthetic speech generated by this method are worse than those of the unit selection and waveform concatenation approach. In addition, the performance for the diversified and the personalized speech synthesis is not ideal in the present stage of this method. On the one hand, it is very difficult to integrate phonetic knowledge into the system and control the generation of acoustic features directly when corresponding training data is not available; on the other hand, when few target speaker’s data is available for speaker adaptation, the naturalness of synthetic speech and the similarity with the target speaker remain to be promoted.The acoustic model structure used in present statistical parametric speech synthe-sis methods lacks the description of speech production mechanism. So this dissertation focuses on the acoustic model integrating speech production mechanisms. Articulatory features and formant features are used as a intermediate level between the top level of phonetic specifications and the bottom level of acoustic observations, the hierarchical structure is a simulation for the actual speech production process. First, the acous-tic modeling method which integrating articulatory features is investigated, a Chinese multi-speaker articulatory database is elaborately made, and the effectiveness of the two-stream model structure for the joint modeling of acoustic features and articulatory features is verified, an acoustic modeling method integrating a target-filtering model and an multiple regression hidden Markov model (MRHMM) is proposed, which real-izes a phonetic knowledge based controllable speech synthesis system; Second, formant features are used as intermediate level between phonetic specifications and acoustic fea-tures, an hidden trajectory model (HTM) based acoustic modeling method is proposed for speech synthesis, which improves the precision of spectrum prediction and the natu-ralness of synthetic speech and realizes a formant controllable speech synthesis method, further an HTM-based model adaptation method is proposed to improve the naturalness and similarity of synthetic speech for speaker conversion.The whole dissertation is organized as follow: Chapter1is the introduction. It introduces the speech production process and several most common speech synthesis techniques.Chapter2introduces the HMM-based statistic parametric synthesis method in de-tail, including the fundamental principles of HMM, the system framework, and some key techniques in the system. Based on some analysis of the characteristics of this method, the motivation of our research work is declared.Chapter3presents an investigation into a two-stream model structure based joint modeling method for acoustic features and articulatory features. First, a method of recording and preprocessing articulatory features captured by electromagnetic articu-lography (EMA) is designed, a Chinese multi-speaker articulatory database is elabo-rately made; then unified acoustic-articulatory HMMs is used for joint modeling; in the end, several aspects of this method are analyzed in this chapter, including the effective-ness of context-dependent modeling, the difference among model clustering methods and the influence of cross-stream dependency modeling.Chapter4presents a controllable speech synthesis method by integrating a target-filtering model and an MRHMM. First, a target-filtering model is implemented to pre-dict the movements of articulators, which is a compact model and all the parameters have definite physical meaning; then a controllable speech synthesis system is pro-posed, the results of objective and subjective tests show that synthetic speech can be controlled effectively under the guidance of phonetic knowledge; at last, a demo sys-tem is developed to illustrate the articulatory controllable speech synthesis system.Chapter5presents a novel HTM-based statistical parametric speech synthesis sys-tem. First, a brief introduction of HTM is given, an HTM is a structured generative model with two-stage implementation; then a framework of HTM-based speech synthe-sis system is proposed, and experiment results show that this proposed method can im-prove the accuracy of spectral feature prediction and the naturalness of synthetic speech and achieve effective controllability on formant characteristics of synthetic speech.Chapter6presents an HTM-based model adaptation method for speaker conver-sion. First, a framework of HTM-based model adaptation is proposed, which can achieve separate or combined transformations for formant related parameters and residual re-lated parameters; then the results of experiments for speaker conversion show that the proposed method can achieve better naturalness and similarity of synthetic speech than traditional maximum likelihood linear regression (MLLR) method.Chapter7concludes the whole dissertation.
Keywords/Search Tags:speech synthesis, hidden Markov model, articulatory features, hidden tra-jectory model, formant features, speaker adaptation
PDF Full Text Request
Related items