Font Size: a A A

Prosody Extraction And Description Of Chinese Mandarin Continuous Speech

Posted on:2008-03-03Degree:MasterType:Thesis
Country:ChinaCandidate:W W WangFull Text:PDF
GTID:2178360212493460Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
Speech is one of the most important interactive approaches to information. Since the 1990s of the 20th Century, speech techniques entered a period of rapid growth in China. With the fast development of the Chinese speech coding, recognition, synthesis, conversion and other speech techniques, related production has been widely used in many areas. Speech has broken through the bottleneck of delivering information mainly by speaking and listening with voice as the only medium. By using multiple medium such as intellectualized terminal and internet work, expression and delivery of information are getting more and more convenient and faster with the combination of speech, characters and images.Much implicit information can be expressed by a prosodic way rather than characters when people communicate, thus, prosodic modeling plays an important role in speech synthesis system. In embedded Text to Speech (TTS) system, limited resource makes it impossible to get fully accordant unites with context, so it's difficult to make the synthesized pronunciation naturally. The method of combining speech analysis and text analysis is investigated to get the information of prosody. Prosody extracting from speech or/and text and labeling can be taken on servers, leaving information parsing and speech synthesizing on intellectualized terminals such as mobile telephone, PDA, etc.This paper is to improve the naturalness of the synthesized speech, with processing and analyses in speech. It takes flowing work to obtain prosody. First, some basic phonetic processes are taken to the Mandarin continuous speech. Second, prosodic information is taken from the continuous speech based on the acoustic character of the prosody. Finally, the prosodic information obtained from speech is labeled in a standard way.When the basic phonetic processes are taken to the continuous speech, syllables are separated, F0 is tracked and F0 contours are smoothed for each syllable. As far as prosodic information extraction is concerned, recognitions of syllabic tone, prosodic boundary, stress and intonation are taken. The HMM technique is used in both tone recognition and intonation judging, and the Neural Networks technique is used in stress judging. The results of prosody extraction are basically accorded with acoustical judgment based on acoustical test. Finally, SSML1.0 is extended and used in prosody labeling to describe the prosodic information correctly.
Keywords/Search Tags:prosody extraction, syllable separation, SSML
PDF Full Text Request
Related items