Font Size: a A A

Research On Tibetan Lhasa Speech Synthesis Based On HMM

Posted on:2015-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:J X ZhangFull Text:PDF
GTID:2298330467474444Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In this paper, Lhasa Tibetan is the study object, the Lhasa dialect’s speech synthesis has been achieved which is conditional on using the trainable speech synthesis (TrainableTTs) to be the basic structure of our speech synthesis system, which is based on HMM (Hidden Markov Model), and meanwhile with the previous preparation of data and the later model training and parameters’ synthesis. The main work and results are as follows:First, building a small Tibetan Lhasa dialect speech corpus. With the characteristics of the Lhasa Tibetan consonants, vowels and tones, select approximately2000speech which are statement sentences from the speech data of Tibet Daily, which are used for the speech synthesis experiments. After the implementation of sub-word speech tags, labels and other prosodic phrase boundary for the selected sentences, label phonemes and prosodic with Praat software, and write the corresponding programs to product single phone and triphone label files which contains time.Secondly, studying on the automatic phoneme segmentation algorithm for single phone and triphone, and labelling the selected speech sentences with the triphone automatic phoneme segmentation algorithm. Testing and analysising the two accuracy rates of the results which are from the two different HMM (Hidden Markov Model), the overall average segmentation accuracy rate of these two are80.69%,88.74%, so it is shown that the accuracy rate of triphone automatic phoneme segmentation algorithm is significantly higher than the single phone automatic phoneme segmentation algorithm’s, with the former one the accuracy and consistency of speech corpus annotation information has been improved.Again, according to the characteristics of Tibetan grammatical structure, rhythm and speech features, contextually relevant attribute rules and questions for decision tree clustering have been designed, the contextually relevant information have been labeled, and the of and Generalized data Mel Cepstral have been obtained.Finally, achieving the speech synthesis of Tibetan Lhasa dialect. Selecting the Tibetan phoneme as base synthetic element, and the relevant acoustic model can be obtained by the trainable speech synthesis which is based on HMM (Hidden Markov Model), after extracting the parameters of fundamental frequency, duration, MFCC, the nature of the speech synthesis can be taken on objective and subjective test and producing some relevant modification proposals. The average score for the speech synthesis MOS2.33.In short, the Tibetan synthesized has a certain intelligibility and a certain degree of recognition, which makes a bedding for the research and the development of Tibetan speech synthesis system.
Keywords/Search Tags:speech synthesis, Hidden Markov Model, Lhasa Tibetan, model training
PDF Full Text Request
Related items