With the rapid development of computer and intelligent information processing, the role of speech processing in the field of human-computer interaction becomes more and more important, and speech synthesis is one of the key techniques to achieve human-computer interaction. In the current stage, statistical parametric speech synthesis and unit selection and waveform concatenation speech synthesis become the two most popular synthetic methods. The former method uses the acoustic model to predict acoustic features and reconstructs speech using parametric synthesizer. This method has advantages of generating smoothing synthetic speech, small system size, realizing speech synthesis of multi-speaker and multi-style flexibly, and so on. For the latter, it first selects proper unit sequence from a pre-recorded corpus, and then the waveforms of the selected units are concatenated to get the synthetic speech. Because of the use of nature waveform in the corpus, it can generate speech of better quality and naturalness. The research of this paper focuses on the latter.In the method of unit selection and waveform concatenation, unit selection based on cost functions is the key point. For the existing unit selection algorithm, the differences of context information, distance of acoustic parameters and the output probability of acoustic model are used to design cost functions and unit selection criterion. On the other hand, evaluations on speech synthesis system still rely on subjective evaluation approaches, such as Mean Opinion Score(MOS) and preference test. The existence of inconsistency between objective unit selection criteria and the subjective evaluation results restricts further improvement on the naturalness of unit selection and waveform concatenation method.This paper focuses on unit selection and waveform concatenation based on hidden Markov model (HMM). It uses the subjective evaluations and feedbacks to optimize the consistency of unit selection criteria and subjective evaluation for the purpose of improving the naturalness of synthetic speech. Methods integrating subjective evaluations and feedbacks into speech synthesis system include extending corpus using perceptual data, synthetic speech error detection, and unit selection using the measurement of log likelihood ratio based on perceptual data.This paper is arranged as follows: The first chapter is the introduction part. The popular methods of speech synthesis in the current stage are briefly introduced. The main principles and key techniques of HMM-based unit selection and waveform concatenation are elaborated. The research purpose of this paper is also stated in this part.The second chapter introduces some of the existing methods of integrating subjective evaluations and feedbacks into unit selection and waveform concatenation system. Merits and drawbacks are also discussed for each method in this part.The third chapter introduced the methods of extending corpus and synthetic speech detection based on perceptual data. Firstly, the existing corpus is extended using naturally labeled speech segments and two kinds of extending methods are introduced and compared when considering whether the acoustic models are updated using embedded updating technique. Secondly, synthetic speech error detector is constructed based on pronunciation space model and SVM classifier, then output paths of baseline system are rescored using the synthetic speech detector. Related experiments verify the effectiveness of above methods in task of Chinese place name synthesis for navigation application.The fourth chapter introduces the Log Likelihood Ration (LLR) measurement based on perceptual data. Natural and unnatural acoustic models are firstly trained using the perceptual data. In the synthesis stage, LLR derived from the two kinds of models is used to guide the process of unit selection. Experiments show that this method can improve the performance of the synthesis system. Rescoring method can achieve better performance than substituting the target cost directly when using LLR to guide the unit selection.The fifth chapter summaries the paper, including the innovation points and future research work. |