Font Size: a A A

The National Language And Accent Pronunciation Dictionary Adaptive Mandarin Speech Recognition

Posted on:2011-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:J ChenFull Text:PDF
GTID:2208360308480955Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
This dissertation primarily concentrates on Chinese speech recognition for nonnative speaker which is almost unavoidable for LVCSR (Large Vocabulary Continuous Speech Recognition). Taking the Putonghua spoken by the speakers whose native language is Naxi as the target languages, we attempt to establish accent-specific speech recognizers from an available standard Putonghua speech recognizer, based on the Initial-Final structure of the Chinese language, in combination with the variation regularity of pronunciation in this minorities'accent.The contributions of this dissertation are as follows:(1) Baseline hidden Markov models (base-line system) were trained by using the project 863 standard Mandarin corpus based on HTK platform.(2) Aimed at Yunnan minority Naxi speech, nonnative mandarin speech recognition is discussed applying general speaker adaptation MLLR and MAP.(3) Firstly, the nonnative speech data from Naxi area in Yunnan was transcribed with the baseline HMMs after adaptation. In addition, the transcribed result was forced aligning with the reference transcription through dynamic programming (DP). Finally, calculate the confusion matrix of base syllables, initials and finals.(4) Study the initials, finals and syllables variation regularity of linguistic minorities accented Putonghua using data-driven method in combination with expert knowledge; a novel strategy of building multi-pronunciation lexicon which can be easily extended to the other accents was proposed to automatically construct the multi-pronunciation lexicon of the given accent(speaker) based on its syllables confusion matrix.(5) Verify the effectiveness of speaker-dependent and accend-dependent pronunciation dictionaries.Experimental results show: the use of baseline, after using the language model, the highest correct rates of base syllable was 50.26%. Using MLLR+MAP, the base syllable correct rates of raised to 80.56%. After acoustic model adaptation, using of the speaker-dependent and accend-dependent pronunciation dictionary, we reached better recognition rates: 85.15%, 82.59%.
Keywords/Search Tags:speech recognition, Naxi-Accented Mandarin, pronunciation variations, acoustic model adaptation, pronunciation dictionary adaptation (PDA)
PDF Full Text Request
Related items