Font Size: a A A

Research On Statistical Parametric Speech Synthesis Of Tibetan Lhasa Dialect

Posted on:2013-10-22Degree:MasterType:Thesis
Country:ChinaCandidate:B LiuFull Text:PDF
GTID:2248330392950811Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
With the rapid development of human-computer speech interaction technology,state-of-the-art Text-to-Speech (TTS) synthesis system can achieve high intelligible andnatural synthesized speech to fit the actual needs of people. In China, Mandarin andCantonese TTS are applied to education, communication and other fields. China is amulti-ethnic country so that there are different minority languages and dialects. Tibetans isone of the ancient Chinese minorities and has their own language and culture. However,there is lack of researches on speech synthesis for synthesizing Chinese minority languagessuch as Tibetan and Chinese dialects due to the differences between different languages. Inorder to deal with the above dificience, this thesis focuses on Tibetan Lhasa dialect speechsynthesis. We design a set of machine pronunciation for Tibetan phonetics named SAMPA-Tto label the pronunciation of Lhasa dialect of Tibetan. A word to SAMPA-T conversionalgorithm is realized to transform text of Tibetan into SAMPA-T. A large speech corpus ofLhasa dialect is collected and full context dependent label is obtained. A question set is builtbased on the features of phonetics for Lhasa dialect. We realize a Hidden Markov Model(HMM) based statistical parametric speech synthesis. Main work and contribution are asfollows:Firstly, a speech corpus of Lhasa dialect is built by analyzing the acoustic features, rhymeand tone characteristics with the "Tibetan dialect survey word list”. The corpus includes600monosyllabic words,400two-syllable word and1000utterances. The corpus can be used notonly for the experimental phonetics study, but also for the engineering study of Tibetan Lhasadialect such as prosody modeling, speech synthesis and speech conversion.Secondly, a SAMPA-T label is designed for labeling the pronunciation of Lhasa dialect ofTibetan. Speech Assessment Methods Phonetic Alphabet(SAMPA) is a kind of computerreadable phonetic alphabet,which adopts computer readable ASCII characters to represent thepronunciations of language. We propose a set of SAMPA label(named SAMPA-T) forTibetan. The SAMPA labels of consonants and vowels are listed alone with the InternationalPronunciation Alphabe(tIPA) for Tibetan. The thesis also realizes the grapheme-to-phoneme conversion of Tibetan by using SAMPA-T. The proposed SAMPA-T can be applied to theTibetan speech synthesis and other Tibetan speech information processing.Thirdly, A Hidden Markov Model (HMM) based statictic parametric speech synthesis isrealized to synthesize Tibetan Lhasa dialect. We design a set of questions and a labelingformat according to the pronunciation characteristics of Tibetan Lhasa dialect. Tibetansentences are labeled with full context-dependent information. We use initial and final asunit to train the mono-unit models. The mono-unit models are then clustered by contextdependent clustering method with the question set. In the synthesis stage, speech parametersare generated by contatenated HMM models according to the decision tree. The synthesizedspeech is evaluated with subjective method. The average Mean Opinging Score (MOS)achieves3.7.
Keywords/Search Tags:Tibetan Pinyin, SAMPA-Tibetan, Grapheme-to-phoneme conversion, Parametric of speech synthesis
PDF Full Text Request
Related items