Font Size: a A A

Research On The Automatic Evaluation Of Pronunciation Proficiency

Posted on:2013-09-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:K YanFull Text:PDF
GTID:1228330377951756Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Automatic pronunciation proficiency evaluation (hereinafter referred to as pronunciation evaluation) plays an important role in computer assisted language learning (CALL). It is implemented as follow:students are required to read given texts and a computer gives scores on overall language competence. This enables CALL systems to act as "virtual teachers" which efficiently provide objective feedbacks. And this can greatly release the wide-spread "teacher shortage problem" Pronunciation evaluation technology provides many potential benefits to both language teaching and learning. In language learning applications, it can help students better understand their overall pronunciation quality and thus improves learning efficiency and facilitates self-studying. In language teaching applications, it can help (or even replace) teachers to carry out the scoring task, thus greatly reduces heavy workloads and improves objectiveness. Therefore, pronunciation evaluation is now becoming a hot topic for both speech scientists and modern education researchers. In the study of pronunciation evaluation, frame-normalized log phone posterior probability (hereinafter referred to as phone posterior probability or PPP) is the most promising feature to measure students’pronunciation proficiency at phonetic level. However, in-depth investigation of this dissertation revealed the two major defects of currently used PPP measurement. Firstly, PPPs of different phones cannot consistently measure phonetic pronunciation quality. Next, as we know, acoustic model is an important part of PPP calculation. However, current modeling approaches, which are originated from automatic speech recognition (ASR), cannot provide satisfying results. The objective of the dissertation is to resolve these two problems and the work is innovative in both aspects of evaluation feature extraction and evaluation-oriented acoustic modeling. The main work and research findings are summarized as follows.Firstly, this dissertation proposed phone-dependent posterior probability transform (PPPT) algorithm. Our work proved that even when infinite data available, affected by the probability spaces, PPPs of different phones are still unable to measure phonetic pronunciation quality consistently. Therefore, the trainable PPPT algorithm was presented to deal with the problem. The PPPTs are trained by minimizing mean square error (MMSE criterion) between human and machine scores and in this way, the transformed PPPs are able to measure phonetic pronunciation proficiency much more consistently. This dissertation investigated linear and non-linear sigmoid transforms and derived explicit solution (linear regression) for linear transform and gradient descent formulae for sigmoid transform. The experimental results showed that both transform yield significant better results. In-depth study showed that when combine with approaches of probability space refinement, which are conventional ways to address the same problem, the system performance can be further improved.Secondly, this dissertation proposed a novel approach of evaluation-oriented acoustic modeling. Acoustic model is an important part of PPP calculation. However, since the research of pronunciation evaluation is originated from ASR, till now researchers are still using ASR methods for acoustic modeling. These approaches neglect the nature of pronunciation evaluation and will bring about following problems. If accented speeches are used for modeling, the resultant acoustic models will "tolerant" accented speeches and this will seriously degrade the performance. If only use standard pronunciations to build acoustic models, the resultant golden acoustic models will mismatch with the accented speeches at test time and cannot accurately measure the pronunciation quality. Therefore, a novel evaluation-oriented acoustic modeling approach is proposed to deal with the problem. In this scheme, the acoustic model is built by MMSE criterion and both standard and accented speeches are used. Therefore, the mismatch between training and test is eliminated and the resultant model is called evaluation-oriented acoustic model. This novel acoustic modeling approach is designed under the framework PPP. Therefore, related technologies, such as probability space refinement and phone-dependent posterior probability transform, can be perfectly embedded in the model training algorithm. Experimental results showed that the proposed modeling approach performs significantly better than traditional modeling approaches. The results showed the necessity to include both standard and accented speeches in acoustic modeling and optimize model parameters by pronunciation evaluation related criteria.Next, this dissertation proposed unsupervised speaker adaptation based on evaluation-oriented mapping transform (EMT). In unsupervised mode of speaker adaptation, labeling errors are inevitable. Therefore, adaptation approaches based on maximum likelihood estimation (MLE) or maximum a posterior (MAP) criteria, which are much less sensitive to labeling errors, play dominate role in unsupervised speaker adaptation. However, in-depth investigation showed that previously proposed MMSE modeling approach does not match with the MLE/MAP criteria. Therefore, it’s not easy to build speaker-dependent (SD) evaluation-oriented acoustic model directly. Instead, this dissertation proposed a in-direct way to build speaker-dependent evaluation-oriented acoustic model based on a set of linear transforms, referred to as evaluation-oriented mapping transforms (EMTs). Similarly. EMTs are built with both standard and accented speech by MMSE criterion. In this way, the mismatch between training and test can also be eliminated. At test time, speaker dependent acoustic models are estimated via MLE/MAP criteria and EMTs are applied to them. In this way, speaker-dependent evaluation-oriented acoustic models are obtained. This in-direct modeling approach perfectly takes advantage of both MLE/MAP, which are effective and robust in unsupervised adaptation, and evaluation-oriented modeling, which makes acoustic models excel at measuring pronunciation proficiency. Experimental results showed that in speaker-independent (SI) system, in which speaker adaptation is not adopted, the proposed approach shows similar performance as previously presented evaluation-oriented acoustic modeling method; in speaker-dependent system, the proposed approach can yield significantly gain over traditional MLE/MAP adaptation methods.Finally, this dissertation proposed a unified framework to build system-specific EMT. As we know, human scores are the basis for EMT modeling. However, human scores also include the evaluation of fluency and completeness, which are not connected to posterior probability. Furthermore, MMSE criterion may not suitable for many real applications. This dissertation proposed to embed specific pronunciation evaluation system in EMT training to deal with these problems and derived EMT update formula. In-depth investigations showed that system-specific terms only affect "phonetic slope" calculation. Therefore, after computing all the phonetic slopes in the training set, we can use a unified approach to update EMTs. This approach is referred to as "a unified framework of EMT training" and it greatly expanded the scope of the EMT application. At experimental time, this dissertation successfully embedded PSC (Putonghua Shuiping Ceshi, Chinese mandarin pronunciation proficiency test) pronunciation evaluation system into EMT training via this unified framework and proved the effectiveness. At last, this dissertation also embedded refined PPPT approach in EMT training via this unified framework and the final system significantly outperforms experts with national certification. This showed the combination of PPPT and unified framework of EMT training provides an ideal way to solve the mentioned two major problems of PPP.
Keywords/Search Tags:Automatic Pronunciation Evaluation, Phone Posterior Probability, Phone-dependent Posterior Probability Transform, Evaluation-oriented Acoustic Model, Evaluation-oriented MappingTransform, PSC
PDF Full Text Request
Related items