Font Size: a A A

A Study On Automatic Mispronunciation Detection Based On Statistical Pattern Recognition

Posted on:2010-07-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:F ZhangFull Text:PDF
GTID:1118360275455547Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Automatic mispronunciation detection is the key technique of Computer Assisted language learning(CALL) system.With the help of automatic mispronunciation detection module,CALL system can evaluate the language learner,analysis his pronunciation defection and give him the specific advice and most suitable training materials in order to improve his pronunciation level.This thesis focuses on the automatic mispronunciation detection based on statistical pattern recognition and carries out thorough research in the areas of the acoustic model and the back-end processing.The specific work and research findings of this thesis are summarized below.Firstly,the automatic mispronunciation detection system based on statistical speech recognition is used as the basic strategy in this thesis through the survey of the current technology.A brief introduction of this system is given.This thesis also introduces the details of the algorithms of the measure of mispronunciation scoring and their defect in actual usage.To eliminate the defect,SLPP algorithm is proposed here.While introducing the experiment databases,the consistence of the mispronunciation detection by the experts on these databases is analyzed,this shows up the performance of the artificial level of mispronunciation detection and considers automatic mispronunciation detection as a challenging task.Secondly,in the area of the acoustic modeling,to reduce the mismatch between the training and testing data and build a speaker-independent canonical model,this thesis induces the adaptation technology to the mispronunciation detection system in testing and training.In testing,speaker adaptation based on maximum likelihood linear regression(MLLR) for speech recognition is induced here.Taking account of the difference objections for speech recognition and mispronunciation detection, selective maximum likelihood linear regression(SMLLR) strategy is proposed for the special purpose of mispronunciation detection;In training,adaptive training based on speaker adaptive training(SAT) for speech recognition is induced which can be a useful approach of speaker normalization to reduce the overlap of speaker independent model caused by variation among the speakers of the training data.SAT and SMLLR strategies must be used together as the only canonical model will lead to more inconsistent with the testing data.Thirdly,in the area of the acoustic modeling,besides adaptation technology,this thesis also makes use of the notion of discriminative training original for speech recognition and analyses the special objective function consisted with the target of mispronunciation detection.From the review of the various methods of discriminate training for speech recognition,the connection between these methods and the target of speech recognition is shown.With the analysis of the target of mispronunciation detection task and the related objection functions,this thesis proposes that the strategy of the discriminative function must be consisted with the measure of mispronunciation scoring.Furthermore,the mispronunciation samples are needed in the training database for discriminative function of mispronunciation detection.Fourthly,besides investigating proper strategy for acoustic modeling,improving the back-end processing can also improve the mispronunciation detection system.In this thesis,three-dimension back-end normalization and machine learning back-end processing strategies are proposed.Three-dimension means the speaker-level, context- level and time-level.As the analysis based on the expert rating and experimental data,this thesis proposes the feature of the speaker overall pronunciation score in the speaker-level;as the analysis of the content-dependent posterior probability algorithm,this thesis proposes the phoneme-related feature in the content-level;as the problem of the actual usage,this thesis proposes the context-related feature in the time-level.For the usage of these three features,this thesis proposed three-dimension back-end normalization strategy.To avoid some defects of this strategy,machine learning back-end processing strategy is proposed here which can deal with the incremental multi-features wisely.At last,a reliable system of mispronunciation detection can be achieved by the previous strategies in the acoustic modeling and back-end processing.On the basis of this system,the thesis proposed a strategy of automatic updating of acoustic model by handling of the mispronunciation modeling.The necessity of mispronunciation modeling is proved by the analysis of the algorithms of the measure of mispronunciation scoring.To modeling the mispronunciation,several strategies are proposed.Among them,the performance of half-supervised cluster modeling strategy based on unsupervised parameter estimation is the best.Consequently,through the reliable system and the mispronunciation modeling algorithm,this thesis proposed a strategy for automatic updating of acoustic model of mispronunciation detection, which can continuously improve the acoustic modeling space and the performance of the system.
Keywords/Search Tags:Automatic Mispronunciation Detection, Statistical Speech Recognition, SLPP, SMLLR, DT, Back-end Processing, Machine Learning, Half-supervised Cluster
PDF Full Text Request
Related items