Font Size: a A A

Affective Speech Synthesis

Posted on:2007-03-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z L SuFull Text:PDF
GTID:1118360212460397Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Speech is one of the perfect human-machine interfaces, and speech synthesis is a key technology for communication with speech between human and machine. Since the first speech synthesizer was born, with the application of new methods and techniques, especially the prevalence of combining massive raw speech database and intelligent algorithms such as data mining, TTS (Text To Speech) system based on pitch synchronous overlap adding has reached a high level on clarity and naturalness in recent years, began to be widely commercially used and will step into the people's lives gradually. Being widely used, synthetic speech is required to be better. Improving the expressive ability for TTS system, especially letting the synthetic speech can express emotions like speaker, is accordant with the developing trend of speech synthesis. However, it is still a difficult problem lying ahead. As an interdisciplinary field, affective speech synthesis is a research topic with highly theoretical and applied value, and it has been a new direction of speech synthesis and has been focused on by more and more researchers.In order to synthesize the affective speech, this paper focuses on the fundamental frequency (F0) of affective speech and studies affective speech modeling based on F0 and affective speech synthesizer with the intonation-rules guidance, and some other related algorithms. Based on these studies, the paper has completed a speech synthesis system, which not only validated the modeling method proposed in the paper, but also can be an experimental platform for speech processing related research, and provide good experimental condition for the future research.The main innovative points of this paper are as follows:(1) In the paper an F0 modeling method for affective speech is proposed based on modified Fujisaki model, and a novel and effective approach is proposed to extract the parameters of the model automatically without any manual labels information. The approach separates the F0 contour into low frequency component (LFC) and high frequency component (HFC) with a high-pass filter, then estimates the phrase-command parameters of the model from LFC and the tone-command parameters from HFC. Because of the response characteristic of the command, a left-to-right iterative process is proposed to estimate the parameters in turn. The model can express the F0 contour with parameters which have explicit phonetic meanings and there is clear relationship between the distribution of the parameters and the emotion. Comparing with others, the F0 model proposed in this paper can express emotional features of affective speech. Furthermore, the method, which estimates the parameters of the model, is simple and effective, especially without any manual labels information.
Keywords/Search Tags:Affective computating, speech synthesis, F0 contour, intonation model, affective speech synthesis
PDF Full Text Request
Related items