Font Size: a A A

Trainable Chinese Speech Synthesis System Based On HMM

Posted on:2011-03-02Degree:MasterType:Thesis
Country:ChinaCandidate:J NieFull Text:PDF
GTID:2178360305455342Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Speech synthesis technology is used to achieve human-computer speechcommunication, it is a key technologies necessary that can establish a system ofspoken. The speech synthesis technology can be divided into the following twocategories: 1. Synthesis of waveform coding, 2. Synthesis of parametric analysismethod.This paper is based on the method of analysis of parametric synthesis. Theadvantages of the parameters synthetic model can be very small, the parameters canbe adjusted flexible, and it can synthesize the new voices of people with the differentvoices. This method should analysis the voice signal and get the parameters such asLPC, LSF, Formant. On output, according to the voice message to be synthesized, thecorresponding synthesis parameters are extracted from the voice library and sent tothe Voice Synthesizer sequentially after editing and connection. Under the control ofthe synthesis parameters in the synthesizer, the voice signal is reduced to voicewaveform frame by frame. This method does not use the voice that people speakdirectly, but from the voice parameters extracted with the characteristics of therelevant parameters. When the voice is synthesized, calculate and control thesesynthetical parameters by the corresponding mathematical model.Hidden Markov Model has been used more than two decades in speech signalprocessing, and has been a very mature application. Hidden Markov model is basedon the Markov chain and developed. Because of the actual problem is more complexthan the model described by the Markov chain, and the cases observed is notcorrespondence with the state one by one, but through a set of associated probabilitydistributions, such models called the Hidden Markov Model. It is a double stochasticprocess, one of which is a Markov chain which is a basic and random process, itdescribes the state transition probability. Another random process describes thestatistical relationship between the state and observation values. Hidden MarkovMode is made up of three important algorithms as Forward-Backward Algorithm, Viterbi Algorithm, Baum-Welch Algorithm.This paper based on the algorithm hmm, and combined the hmm tools HTK3.4(HMM Tools Kit)and HTS2.0(Hmm-based Text to Speech),designed andimplemented Hmm-based Chinese speech synthesis system. The main research workas follows:1. The analysis of Chinese language text and cadence. This paper includes theconversion of character to pronunciation and cadence analysis. Designed andimplemented conversion system of Chinese character to pronunciation. The functionaccomplished is the conversion of Chinese characters to Pinyin, in which theconversion of number and date to the Pinyin done a very good deal. Then analysis thecadence features of Chinese and give the performance of voice that the characteristicsof cadence parameters and summarize the cadence rules at last.2. The research of key technology on the synthesis system. The system of theChinese synthesis designed is divided into two parts, the training part and syntheticpart. The main task of the training part is to configure modeling parameters, build asound library, mark a text file, build a context attribute set, design a question set andso on. Among them, the sound library from a particular type of subject matter, so usea small amount of data will help to achieve the purpose of adequate training, can agood statement to complete the synthesis of such materials. Marking uses the methodthat marking parts of the training model by hand-marking first, automatic marking thefinal training model. In accordance with the rules of Chinese cadence context attributeset design, according to the context attribute set design question set, Triph-model statefor a few too many decision tree-based clustering solutions. The main task of thecompletion part is data preparation, modeling by sentence of HMM and the synthesisof speech. The data used in the synthesis process is made up of the note file generatedby analyzing the text to be synthesized, the decision tree file generated by training, themel-cepstal model file, the fundamental frequency model file, and the duration modeldocument. Then the corresponding HMM of state duration, phoneme cycle andfrequency spectrum is generated after searching the decision tree using consonant andvowel phoneme. And the Sentence HMM model is created. At last, the sentence HMM sequence is predicted according to the well trained mel-cepstal, fundamentalfrequency and the duration model document of every phoneme. And the voice issynthesized by MLSA synthesizer, with the synthesis parameter composed ofSpectrum parameters and fundamental frequency parameters, which both have themaximum output probability.3. The specific achievement of the HMM-based Chinese speech synthesissystem. The Windows XP operating system is chosen as the developing platform.According to the specific settings of the modeling such as acoustic parameters,unit-scale, topology, number of the state, the number mixtured in Gaussiandistributions, select the text of the sound library which is the Contemporary literaturework, and record the sound, using HTK kit commands to complete the training modelof singe phoneme, auto-tagging, combined with the tools praat and the contextattributes to design and complete the marked of the context. The text which bemarked and the sound files will be sent into HTS2.0 for training. Under the control ofthe Trainning.pl script file, the whole variance, the estimate of variance floor, theinitialization and training of the Su-tone model, the Context-sensitive model training,the model clustering based on the decision tree, the model training and durationmodeling after clustering is calculated one by one. After training, the synthesisexperiment is done inside and outside the training set, using the parameters generated,through hts_engine.4. According to the experimental results showed no effect of the stress of thesituation of Chinese accent model of refinement, accented by the rhythm of the word,prosodic phrase, prosodic word to mark stratification, when the analysis of stress inthe synthesis of marked documents, identify need to repeat the word or phrase,syllable by modifying the length of time and frequency to achieve a re-read results.The system, which can automatically divide the Chinese prosodic boundary andpredict the accent accurately, will be designed in the near future. And on this basis, thefront-end of Chinese speech synthesis system based on HMM will be designed. At lastwe can implement the complete Chinese speech synthesis system based on HMM.
Keywords/Search Tags:HMM, Speech Synthesis, Chinese Speech, Accent
PDF Full Text Request
Related items