Affective Speech Synthesis

Posted on:2007-03-25

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Z L Su

Full Text:PDF

GTID:1118360212460397

Subject:Pattern Recognition and Intelligent Systems

Abstract/Summary:

PDF Full Text Request

Speech is one of the perfect human-machine interfaces, and speech synthesis is a key technology for communication with speech between human and machine. Since the first speech synthesizer was born, with the application of new methods and techniques, especially the prevalence of combining massive raw speech database and intelligent algorithms such as data mining, TTS (Text To Speech) system based on pitch synchronous overlap adding has reached a high level on clarity and naturalness in recent years, began to be widely commercially used and will step into the people's lives gradually. Being widely used, synthetic speech is required to be better. Improving the expressive ability for TTS system, especially letting the synthetic speech can express emotions like speaker, is accordant with the developing trend of speech synthesis. However, it is still a difficult problem lying ahead. As an interdisciplinary field, affective speech synthesis is a research topic with highly theoretical and applied value, and it has been a new direction of speech synthesis and has been focused on by more and more researchers.In order to synthesize the affective speech, this paper focuses on the fundamental frequency (F0) of affective speech and studies affective speech modeling based on F0 and affective speech synthesizer with the intonation-rules guidance, and some other related algorithms. Based on these studies, the paper has completed a speech synthesis system, which not only validated the modeling method proposed in the paper, but also can be an experimental platform for speech processing related research, and provide good experimental condition for the future research.The main innovative points of this paper are as follows:(1) In the paper an F0 modeling method for affective speech is proposed based on modified Fujisaki model, and a novel and effective approach is proposed to extract the parameters of the model automatically without any manual labels information. The approach separates the F0 contour into low frequency component (LFC) and high frequency component (HFC) with a high-pass filter, then estimates the phrase-command parameters of the model from LFC and the tone-command parameters from HFC. Because of the response characteristic of the command, a left-to-right iterative process is proposed to estimate the parameters in turn. The model can express the F0 contour with parameters which have explicit phonetic meanings and there is clear relationship between the distribution of the parameters and the emotion. Comparing with others, the F0 model proposed in this paper can express emotional features of affective speech. Furthermore, the method, which estimates the parameters of the model, is simple and effective, especially without any manual labels information.

Keywords/Search Tags:

Affective computating, speech synthesis, F0 contour, intonation model, affective speech synthesis

PDF Full Text Request

Related items

1	Research On Affective Speech Synthesis
2	An Improved Speech Synthesis Method
3	The Research And Application Of Speech Affective Computing
4	Research On Statistical Parametric Mandarin-Tibetan Cross-lingual Speech Synthesis
5	Create An Emotional Speech Synthesis Corpus
6	A Study On Speech Synthesis And Visual Speech Synthesis Based On Neural Networks
7	Research And Implementation Of Speech Synthesis Method For Helping Old Robots
8	Key Technologies For Text-to-speech Systems
9	Based Hmm Can Be Training Vietnamese Speech Synthesis System
10	Research Of Embeded Speech Synthesis Technology