Font Size: a A A

A Research On Mispronunciation Detection Based On Statistical Pattern Recognition

Posted on:2009-12-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:S WeiFull Text:PDF
GTID:1118360242995813Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the improvement of automatic speech recognition, Computer Assisted language learning (CALL) system becomes better and better. The key technique for enhancing the performance of CALL system is automatic mispronunciation detection. With the help of automatic mispronunciation detection module, CALL system can give the learner specific advice for language learning and give him most suitable training materials. This paper carries out detailed research based on statistical pattern recognition, especially statistical speech recognition, which includes acoustic feature extraction, acoustic modeling, mispronunciation detection algorithm and etc. Inspiring by these researches, this paper opens out the nature of mispronunciation detection. The detailed research and results of this paper are abstracted as follows.Firstly, this paper improves the mispronunciation detection algorithm based on statistical speech recognition techniques. By introducing CMN and VTLN, the mismatch between the acoustic model and the learner is weakened. At the same time, BIC based parsimonious model construction method is used to put the emphasis on the confusing phonemes and the phonemes that is frequently mispronounced. Then MLLR (Maximum Likelihood Linear Regression) is used to transform the acoustic model to target speaker to diminish the acoustic mismatch. At last revised posterior probability based on the mispronunciation confusing matrix is used as the measurement of mispronunciation with phoneme-dependent threshold.Then, posterior probability is extended to cover the mispronunciation prior probability using TMPP (Text-dependent Mispronunciation Prior Probability), which leads to TCPP (Text-dependent Correct Pronunciation Posterior Probability). Inspiring by language modeling in speech recognition, absolute discounting method is used to address the zero probability problem. Experiment results indicate that TCPP method can significantly outperform the original posterior probability. TCPP also can get as equal if not better performance as the heuristic mispronunciation confusing matrix based posterior probability method, while the heuristic method suffers from zero probability problem and neglects the frequency of mispronunciation.Then, by investigating the confidence measure of speech recognition, this paper finds another method for mispronunciation pronunciation, which is constructing classifier using features got from speech recognizer to detect mispronunciation. This method is found also widely used in speaker verification. This paper uses likelihood ratios as the features and uses SVM (Support Vector Machine) as the classifier. As a discriminative classifier, SVM can catch the discriminative information embedded in the likelihood ratios using the human-labeling mispronunciation data and finally improves the performance of mispronunciation detection.Then, this paper analyzes the disadvantages of phoneme-based acoustic models. Phoneme-based models are helpless for partially wrong pronunciation and the mispronunciations far from all right phonemes. For a phoneme, actually there are standard pronunciation, right pronunciation and wrong pronunciation. This paper introduces PSM (Pronunciation Space Model) to describe the characteristics of pronunciation and specific mispronunciation acoustic models are used to handle variable mispronunciations. This paper utilizes plenty of pronunciation data collected from various people and various environments to construct mispronunciation models by unsupervised clustering method. The mispronunciation models contain "standard pronunciation model", "accented pronunciation model" and "heavily accented pronunciation model". SVM is used to classify mispronunciation and right pronunciation based on the likelihood ratios got from the mispronunciation models. Experiment results indicate that PSM significantly outperforms the original phoneme-based acoustic models.Lastly, this paper investigates the tone mispronunciation detection and introduces maximum likelihood pitch mean normalization method to deal with the difference between different speakers. Maximum likelihood feature selection method is used to address the half and double frequency problem of pitch extraction. Experiments based on these methods obtain improvement of tone mispronunciation detection.
Keywords/Search Tags:speech recognition, mispronunciation detection, support vector machine, pronunciation space model, tone mispronunciation detection
PDF Full Text Request
Related items