Font Size: a A A

Research On Objective Evaluation Of Pronunciation Quality In An Interactive Language Learning System

Posted on:2008-03-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:C L LiFull Text:PDF
GTID:1118360215467524Subject:Physical Electronics
Abstract/Summary:PDF Full Text Request
Interactive Computer Assisted Language Learning (CALL) systems based on speechprocessing technology attract much attention in the field of speech technology.CALL systems can change the existing language teaching and learning methods andimprove efficiency. In this paper, the method for the objective evaluation of Englishpronunciation quality in interactive language learning systems is studied.There are three stages of the process for the experts to evaluate the pronunciationquality, namely listening, perception and evaluation. Hidden Markov Models (HMM)are implemented in existing methods for the objective evaluation of the pronunciationquality to construct the evaluation model. However, existing methods based on HMMcan not provide satisfying results because of the lack of the modeling of theperception. Creative work on objective evaluation model of the pronunciation qualityis carried out, including: acoustics, perception, prosody modeling and the method forthe objective evaluation of pronunciation quality based on such models.We propose a method based on HMM with reference speech to evaluate thepronunciation quality objectively, and the correlation coefficient is improved from0.52 to 0.67. The effects of the accent of the native speech, the complexity of themodel and the level of alignment on the correlation coefficient are also studied.A method for the objective evaluation of pronunciation quality based on perceptualmodel is proposed. The perceptual models based on Bark scale mapping and Mel frequency scale mapping are proposed separately. The effects of duration on speechalignment are also studied. The correlation coefficient between the objective score ofthe perceptual and the expert score is 0.723, which is higher than that of existingmethods based on HMM.The duration models based on Gamma distribution and histogram distribution arestudied. The correlation coefficient between the duration score of the Gamma modeland the expert score is 0.66.We also propose a pitch model based on the pitch of the speech. The differencebetween the extrema of the pitch is more important than the mean value. The pitchscore given by the difference between the extrema of the pitch in vowels has a highercorrelation coefficient with the expert score than that of the mean values of the pitch.The score-fusion methods based on linear model and Support Vector Machine (SVM)are compared. The method for the fusion of the scores output from the acoustic model,the perceptual model, the duration model and the pitch model based on the referencespeech is proposed. The correlation coefficient between the objective score and theexpert score is 0.800, which is one of the best results in this research field.A method of detecting the sentence stress in speech is also proposed. The experimentis performed on the Boston Broadcasting Radio News corpus and the correct rate is82%.The criterion for the subjective evaluation of the pronunciation quality is establishedand the database for the evaluation of pronunciation quality is constructed.
Keywords/Search Tags:Pronunciation Quality, HMM, Perceptual Model, Duration Model, Score Fusion, SVM
PDF Full Text Request
Related items