Font Size: a A A

Research On Transfer Learning For Khalkha Mongolian Speech Recognition Acoustic Model

Posted on:2020-10-13Degree:MasterType:Thesis
Country:ChinaCandidate:L Y ShiFull Text:PDF
GTID:2428330596992641Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Due to lack of labeled training data,the performance of acoustic model in low-resource speech recognition system is poor.Transfer learning solves data sparseness problem to a certain extent.It guides the training of the target domain(low-resource)model by learning the source domain(high-resource)knowledge.This thesis mainly studies the low-resource Khalkha dialect Mongolian acoustic model,and focuses on improving the performance of the acoustic model with deep learning and transfer learning.First,according to the similarity between Khalkha dialect and Chahar dialect,this thesis proposes a baseline system,based on Time Delay Neural Network(TDNN),which uses Mongolian speech recognition system on Chahar dialect to decode Khalkha dialect directly.In addition,this thesis trains the TDNN acoustic model based on random initialization using the existing Khalkha dialect data,and explores the performance of the chain model training method based on the discriminative training criterion in the acoustic mode.The experimental results show that the TDNN acoustic model based on the random initialization of the Khalkha dialect is significantly better than the baseline system,and the chain model training method can further improve the performance of the acoustic model.Second,this thesis first applies the fine-tuning method in transfer learning to the Khalkha dialect Mongolian speech recognition system.The English and Chahar dialect are used as the source domains,and the trained acoustic model on the above source domains are conducted to initialize the Khalkha acoustic model parameter.The experimental results show that the performance of the acoustic model based on the fine-tuning is significantly improved compared with the baseline system and the random initialization model.The acoustic model obtained by using the Chahar dialect as the source domain is better.Third,this thesis first builds the Khalkha dialect Mongolian speech recognition system based on the weight transfer method.Furthermore,this thesis verifies the effectiveness of different training strategies,analyzes the portability of different hidden layers,and compares the impact of the pre-trained model on the transfer performance.The experimental results show that the model based on weight transfer can significantly reduce the word error rate(WER).Finally,the optimal acoustic model is chain TDNN based on weight transfer method with Chahar dialect as the source domain.The final WER is 15.67%,which is relatively reduced by 63% and 38% compared to the baseline model and random initialization model.
Keywords/Search Tags:Speech Recognition, Khalkha Mongolian Dialect, Fine-tuning, Weight Transfer, Time Delay Neural Network, Chain Model
PDF Full Text Request
Related items