Font Size: a A A

Research On Speech Quality Evaluation For Mandarin-Tibetan Cross-Lingual Speech Synthesis

Posted on:2017-06-19Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhangFull Text:PDF
GTID:2348330488970275Subject:Intelligent information processing
Abstract/Summary:PDF Full Text Request
Since cross-lingual speech synthesis can synthesize speeches in different languages with a speech synthesizer, it has been a hot topic in the field of speech signal processing. Currently, Northwest Normal University has realized a Mandarin-Tibetan cross-lingual speech synthesis system. The thesis firstly realized a Mandarin-Tibetan cross-lingual speech synthesis system, then the synthesized speech under different schemes was evaluated using subjective and objective evaluation method. Furthermore, the thesis proposed an approach to evaluate the similarity between original speaker and synthesized speech by using speaker recognition. The synthesized speech quality is also evaluated with a speech recognition system. The main works and originalities are as follows:Firstly, the thesis designed a set of Mandarin-Tibetan cross-lingual speech synthesis scheme for synthesizing different speech. Speech and text corpus of Mandarin and Tibetan Lhasa dialect was selected. A context-dependent label format is designed depend on the corpus. A question sets is also designed for decision tree cluster of models. The speaker adaptive training is employed to train acoustic models of Mandarin-Tibetan cross-lingual speech synthesis. Mandarin and Tibetan speeches were synthesized with a vocoder.Secondly, the thesis evaluated synthesized Mandarin speech and Tibetan Lhasa dialect speech under different schemes. Subjective evaluation methods and objective evaluation methods are adopted. The subjective evaluation methods include mean opinion score, degradation mean opinion score, comparison category rating and diagnostic rhyme test. The objective evaluation methods include pitch parameter measurement, duration parameter measurement and perceptual evaluation of speech quality. Tests show that the synthesized Mandarin speech and Tibetan Lhasa dialect speech achieve high quality when Mandarin training speech is 110 and Tibetan training speech is 300, respectively.Thirdly, a speaker recognition technique is employed to evaluate whether the synthesized multi-speaker speech is similar with original speaker. A speaker recognition system is built that uses Gaussian mixture model as acoustic model. The thesis combine traditional shorttime processing technique with empirical mode decomposition to obtain the acoustic features. Tests show that the speaker recognition rates achieve 88.89% and 94.44% for synthesized Mandarin speech and synthesized Tibetan speech when training sentence for speaker adaptive training reached 110 for Mandarin and 300 for Tibetan respectively.Finally, a speech recognition method is applied to evaluate the speech quality synthesized by the Mandarin-Tibetan cross-lingual speech synthesis system. Five-state continuous hidden Markov model is adopted for acoustic model, and the 13 dimensional Mel frequency cepstral coefficients alone with their first and second differences are used to consist of 13*3 dimensional feature vector to train the acoustic model. Tests show that the speech recognition rates achieve 96.41% and 91.27% for synthesized Mandarin speech and synthesized Tibetan speech when training sentence for speaker adaptive training reached 110 for Mandarin and 300 for Tibetan respectively.
Keywords/Search Tags:Mandarin-Tibetan cross-lingual speech synthesis, subjective evaluation, objective evaluation, speaker recognition, speech recognition
PDF Full Text Request
Related items