Font Size: a A A

The Research Of Speech Synthesis And Prosody Control In Wu-Dialect Text-to-Speech

Posted on:2003-01-27Degree:MasterType:Thesis
Country:ChinaCandidate:K Y DuanFull Text:PDF
GTID:2168360065460352Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
In recent years,Text-to-Speech (TTS) is a heated research aspect in the field of speech signal processing and its function is to convert text into speech by computer.The voice of a successful TTS system should be clear and fluent,so a TTS system should have an excellent speech-synthesis module.But the voice synthesized by connecting the voice of single word is short of naturalness that is determined by the variation of the tone of speech.In continuous speech,the voice of a word is affected by not only its own pronunciation but also the tone of the word adjacent to it.So in a TTS system,text analysis should be done first and according to the context the variation of the tone of every word can be determined and the variation will be used to control speech synthesis.Thus text analysis and prosody control module should be included in a TTS system. Text analysis, prosody control and speech synthesis module are three cores of a TTS system.China is a multi-dialect country and many dialects have long history and are used in considerably large range.In constrast to mandarin,dialect has much difference in pronunciation,grammer and the usage of word.Therefore research of dialect TTS is meaningful in both science and application.The Chinese speech can be determined by three factors that are initial,final and tone.Mandarin and dialect share these characteristics.According to the figure of the waveform in time domian,Chinese speech waveform can be divided into two kinds.One is unstable section that changes rapidly and irregularly.The other isstable section that appears evidently periodic and its period changes slowly.The unstable section mainly consists in initial part and the rear of final part.The stable section consists in the middle of final part.The tone of speech is determined by the change of its period(or frequency). Through measuring and statisticsing the change of period the rules about tone change of continuous speech can be summarized.These rules can be used to control and rectify the rhyme and tone of synthesized speech so as to improve its' naturalness and intelligibility.This paper do some creative work about the speech synthesis and prosody control in TTS system of Wu-dialect and proposed a new method of speech synthesis which synthesizes speech by using time-domain waveform. We call this method Pitch-Synchronous Frame Concatenation(PSFC).Meanwhile a new method of prosody control is also proposed on the basis of PSFC.As for the text analysis module,we can refer to methods of mandarin TTS system,so it isn't the main work in this paper . Using these methods we achieve Text-to-Speech of Wu-dialect. According to the experiments,these method are quite concise and practicable and the result is satisfied in the aspects of naturalness and intelligibility.
Keywords/Search Tags:Text-to-Speech(TTS), text analysis, prosody control, Wu-dialect speech synthesis, Pitch-Synchronous Frame Concatenation(PSFC), Pitch period
PDF Full Text Request
Related items