A Study Of Key Technologies To Freely Spoken Mandarin Speech Evaluation

Posted on:2017-04-10

Degree:Master

Type:Thesis

Country:China

Candidate:S K Xu

Full Text:PDF

GTID:2308330485451803

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Automatic pronunciation proficiency evaluation is implemented as follow:exam-inees are required to pronounce according to some rules, and a computer gives scores automatically with respect to the quality and standard of the pronunciation. Traditional evaluation task is context-dependent which means students need to read given texts ex-actly, such as reading or strictly reciting situation. In this task, frame-normalized log phone posterior probability is the most useful feature to represent the quality of pro-nunciation, and has a high correlation with human scores. This technology has been widely and successfully used in many applications. However, in context-independent situation it is not such easy and straight-forward. For example, examinees need to do freely speaking under some prepared topics. Human Examiners give scores according to not only the quality of pronunciation but also the degree of standard on vocabulary and grammar. Very rare researches have been conducted on this task until now. This thesis is intended to do some initial researches on this work. Specifically, we will con-duct our work on the fourth part of Putonghua Shuiping Ceshi (which is short as PSC) in China mainland. This test asks examinees to speak freely for three minutes with a given topic and it matches our purpose well. The main work of this thesis is summarized as follows:Firstly, this thesis proposes how to use speech recognition method to calculate pos-terior probability feature for the context-independent task, which is just like the context-dependent situation usually does to evaluate the quality of pronunciation. We use DNN-HMM model to do recognition on the wave produced by examinees and work out the posterior probability of each phoneme to observation in recognition results, and have done some improvements according to the specific information of PSC test. Experi-ments show that this kind of posterior probability feature also has a high correlation with human scores.Secondly, since the process of posterior probability calculation in this thesis re-lies heavily on the performance of speech recognition. To improve the accuracy of recognizer, we use Recurrent Neural Network (RNN) language model to do rescoring on the N-best candidates produced during the first recognition process and select the maximum score candidate as the new recognition. Experiments show that both the ac-curacy of recognition and the correlation between features and human scores have been improved.Thirdly, to assess the influence of dialect to posterior probability, we involve likeli-hood scores produced by dialect clustered nodes added to deep neural network acoustic model which is re-trained as a multi-lingual style. Then we can get the likelihood of each frame observation to dialect model, and add this likelihood score to the denomi-nator of the equation for posterior probability calculation so as to consider the degree of dialect.Fourthly, we calculate the average frame number of each phoneme to reflect the fluency of pronunciation. And we use Conditional Random Field (CRF) model to do a judgment on sentence-in or sentence-end for recognition results to achieve better sen-tence boundary information. Experiments show that this way can improve the correla-tion between the fluency feature and human scores.Lastly, we use Vector Space Model (VSM) to directly model the recognition so as to reflect the standard of vocabulary and grammar. We find that an unsupervised RBM transform for VSM can achieve good performance. And for the fairness, we do some work on the cheating of the specific expression.

Keywords/Search Tags:

PSC, freely spoken speech, pronunciation quality evaluation, posterior probability, multi-lingual neural network, recurrent neural network, conditional random field, vector space model

PDF Full Text Request

Related items

1	Speech Enhancement Based On Deep Neural Network And Recurrent Neural Network
2	Research On The Automatic Evaluation Of Pronunciation Proficiency
3	Cross-lingual Speech Synthesis Based On Statistical Models
4	Research On Deep Neural Networks For Multi-focus Image Fusion
5	Research On The Normalization Of Spoken Language In Speech-to-speech Translation
6	Research On Named Entity Recognition Of Chinese Image Reports Based On Recurrent Neural Networks
7	Video Quality Evaluation Based On Recurrent Neural Network
8	The Research On Spoken Language Understanding Based On Recurrent Neural Network
9	Research On Speech Enhancement Method Based On Parallel Optimize Recurrent Neural Network
10	Road Segmentation And Extraction Of High Resolution Aerial Imagery Based On Recurrent Neural Networks