Font Size: a A A

Research On Statistical Parametric Mandarin-Tibetan Cross-lingual Speech Synthesis

Posted on:2016-09-18Degree:MasterType:Thesis
Country:ChinaCandidate:H Y WangFull Text:PDF
GTID:2308330470980035Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
In recent years, Multilingual Speech Processing has become an important research direction of intelligent speech information processing, we can use cross- lingual Speech Synthesis technology to synthesis speech of different languages by a same system. China is a multi-ethnic country with different minority languages and dialects. Therefore the research on cross-lingual speech synthesis has important significance to promote the development of speech processing technology. However, there is no a complete speech synthesis system to synthesis the language of mandarin/minority or mandarin/dialect. Aimed at above situation, the thesis selects Mandarin and Tibetan Lhasa dialect to study the characteristics and similarities of their pronunciation. A set of phonetic alphabet is designed to label their initials and finals, a set of context-dependent label format is designed to label their context information, and a question set is also realized for context dependent decision tree clustering. Based on traditional hidden Markov model(HMM), a set of average mixed-lingual models are trained from a large Mandarin multi-speaker-based corpus and a small Tibetan one-speaker-based corpus using speaker adaptive training(SAT). Then, the speaker adaptation transformation is applied to train speaker dependent(SD) model using a small amount of Tibetan or Mandarin data. Finally, a Mandarin-Tibetan cross-lingual speech synthesis system is realized to synthesize Mandarin speech or Tibetan speech. Main works and originalities of the thesis are as follows:Firstly, a set of statistical parametric speech synthesis oriented computer readable phonetic alphabet is designed for Tibetan and Mandarin. According to the similarities in pronunciation of consonants, vowels and tones between Mandarin and Tibetan, The thesis adopts the Speech Assessment Methods Phonetic Alphabet(SAMPA) to label the pronunciation of their initials and finals to realize the convert of text to pronunciation.Secondly, a full context-dependent label format is designed. Six levels context-dependent label format is designed by taking into account the six layer’s context-dependent information including unit level, syllable level, word level, prosodic word level, prosodic phrase level and sentence level. The thesis also extends a set of Mandarin questions by adding Mandarin-specific and Tibetan-specific questions to perform the context dependent clustering of HMM states.Thirdly, the thesis proposes a method to realize HMM-based Mandarin-Tibetan cross-lingual statistical speech synthesis using speaker adaptive training. A average mixed-lingual model is trained from a large Mandarin multi-speaker-based corpus and a small Tibetan one-speaker-based corpus using speaker adaptive training. Then, adopting speaker adaptation transformation to train speaker dependent model using a Mandarin one-speaker-based corpus and a small Tibetan one-speaker-based corpus. And Tibetan speech or Mandarin speech is synthesized using a same system.Finally, a Mandarin-Tibetan cross-lingual speech synthesis system is realized. Subjective evaluation and objective evaluation results show that the synthetic speech quality using proposed method is better than SD methods when Tibetan training utterances is small.
Keywords/Search Tags:polyglot speech synthesis, Mandarin-Tibetan bilingual speech synthesis, Tibetan speech Synthesis, speaker adaptive training, HMM-based speech synthesis
PDF Full Text Request
Related items