Font Size: a A A

Research On The Assessment Of Mandarin Pronunciation Of Tibetan Speakers

Posted on:2022-08-02Degree:MasterType:Thesis
Country:ChinaCandidate:J L JiangFull Text:PDF
GTID:2518306500956929Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
The Tibetans are one of the fifty-five ethnic minorities in my country.They are widely distributed and have a large population.Tibetan speakers always have some types of fixed pronunciation errors when they speak Mandarin,which are affected by their native language pronunciation habits.With the rapid development of speech processing technology and artificial intelligence,Computer Aided Language Learning(CALL)has become more and more popular among people and this online education method will be a new trend.In this thesis,the Tibetans in Gansu Province learn Mandarin pronunciation as the research object,from the perspective of linguistics,compares and analyzes the acoustic characteristics of Tibetan and Mandarin to build a suitable corpus.Based on the related theories of speech signal processing and machine learning,the information of the pronunciation characteristics of Tibetan-speaker Mandarin is extracted,and then perform the mispronunciation detection at the syllable level,and the pronunciation and overall quality similarity measures is realized at the phoneme level.The main works and originality of this thesis are as follows:Firstly,a corpus suitable for the research on the assessment of mandarin pronunciation of Tibetan speaker has been established.This thesis compares and analyzes the acoustic characteristics of Tibetan and Mandarin from the perspective of linguistics,and sums up the pronunciation characteristics of Tibetan-speaking speakers' Mandarin.On this basis,the text design and voice recording are carried out.Furthermore,we set the rules for the annotation of the audio recordings in hierarchical format: the phrase layer is marked with Chinese characters;the syllable layer is marked with pinyin;the phoneme layer is labeled with SAMPA-TSC,which is designed by combining existing work and the typing characteristics of Praat software.Finally,we scientifically evaluate the created corpus in terms of coverage,completeness,quality and reusability.Secondly,an improved automatic syllable and phoneme segmentation method is realized.The thesis first starts from the perspective of text to audio alignment,a segmentation method based on the Hidden Semi-Markov Model(HSMM)is established.The final 1 values of syllables and phonemes are 36.16% and 38.99% respectively,and the correct segmentation rates of syllables and phonemes are 64.09% and 47.71%respectively.In order to obtain a better segmentation result,this thesis set up the onset detection-based method.The 1 values of the syllables and phonemes of this method are 75.09% and 75.23% respectively,and the correct segmentation rates of syllables and phonemes are 84.19% and 60.49% respectively.Thirdly,a method of mispronunciation detection at syllable level is implemented.mispronunciation detection is a sub-task of Tibetan-speaker Mandarin pronunciation assessment,which is based on the syllable segmentation and the discriminative model.Among them,the syllable segmentation method is based on the onset detection,the 1value of the syllable segmentation is 77.61% and the correct segmentation rate is86.74%.As for the discriminative model,its baseline model is a Recurrent Neural Network(RNN)structure with a Bi-directional Long Short Term Memory(Bi-LSTM)unit,and then we try to integrate other deep learning methods such as Convolutional Neural Network(CNN),Attention and Dropout are combined into the baseline model.The final model has an accuracy rate of 62.37% for mispronunciation detection on the test set.Finally,a method for measuring the similarity of pronunciation and overall quality at the phoneme level is implemented.it is another sub-task of Tibetan-speaker Mandarin pronunciation assessment.This thesis first introduce a phoneme embedding neural network(RNN with Bi-LSTM unit)with the main part is a single or multiple recurrent layers,which is able to convert variable-length phoneme segments into fixed-length vectors.According to the results of the verification set,we determine the final number of layers: the pronunciation aspect is an RNN structure with 2 recurrent layers;the overall quality aspect is an RNN structure with 1 recurrent layer.However,in the test set experiment,it is found that the overall quality result is quite different from the result on the verification set.We analyze the reasons and make reasonable guesses and verify our guesses through experiments.Finally,we improve the phoneme embedding neural network separately: adding Attention mechanism,using 32-dimensional embedding,overlaying CNN layers,using Dropout method,and finally improving the model's experimental results have different degrees of improvement in pronunciation and overall quality.
Keywords/Search Tags:Tibetan-speaker mandarin, Pronunciation assessment, Syllable and phoneme segmentation, Mispronunciation detection, Pronunciation and overall quality similarity measures
PDF Full Text Request
Related items