Research On Transfer Learning For Khalkha Mongolian Speech Recognition Acoustic Model

Posted on:2020-10-13

Degree:Master

Type:Thesis

Country:China

Candidate:L Y Shi

Full Text:PDF

GTID:2428330596992641

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Due to lack of labeled training data,the performance of acoustic model in low-resource speech recognition system is poor.Transfer learning solves data sparseness problem to a certain extent.It guides the training of the target domain(low-resource)model by learning the source domain(high-resource)knowledge.This thesis mainly studies the low-resource Khalkha dialect Mongolian acoustic model,and focuses on improving the performance of the acoustic model with deep learning and transfer learning.First,according to the similarity between Khalkha dialect and Chahar dialect,this thesis proposes a baseline system,based on Time Delay Neural Network(TDNN),which uses Mongolian speech recognition system on Chahar dialect to decode Khalkha dialect directly.In addition,this thesis trains the TDNN acoustic model based on random initialization using the existing Khalkha dialect data,and explores the performance of the chain model training method based on the discriminative training criterion in the acoustic mode.The experimental results show that the TDNN acoustic model based on the random initialization of the Khalkha dialect is significantly better than the baseline system,and the chain model training method can further improve the performance of the acoustic model.Second,this thesis first applies the fine-tuning method in transfer learning to the Khalkha dialect Mongolian speech recognition system.The English and Chahar dialect are used as the source domains,and the trained acoustic model on the above source domains are conducted to initialize the Khalkha acoustic model parameter.The experimental results show that the performance of the acoustic model based on the fine-tuning is significantly improved compared with the baseline system and the random initialization model.The acoustic model obtained by using the Chahar dialect as the source domain is better.Third,this thesis first builds the Khalkha dialect Mongolian speech recognition system based on the weight transfer method.Furthermore,this thesis verifies the effectiveness of different training strategies,analyzes the portability of different hidden layers,and compares the impact of the pre-trained model on the transfer performance.The experimental results show that the model based on weight transfer can significantly reduce the word error rate(WER).Finally,the optimal acoustic model is chain TDNN based on weight transfer method with Chahar dialect as the source domain.The final WER is 15.67%,which is relatively reduced by 63% and 38% compared to the baseline model and random initialization model.

Keywords/Search Tags:

Speech Recognition, Khalkha Mongolian Dialect, Fine-tuning, Weight Transfer, Time Delay Neural Network, Chain Model

PDF Full Text Request

Related items

1	Research On Mongolian Speech Recognition Acoustic Model Based On Deep Learning
2	Application Research Of Deep Learning In Speech Recognition Of Sichuan Dialect
3	The Study On Acoustic Model Based Neural Netword In Mongolian Speech Recognition System
4	Time Delay Neural Network Based Automatic Speech Recognition
5	Noise Robust Speech Recognition Based On CNN-TDNN And Transfer Learning
6	Research On Mandarin Speech Recognition Technology Based On Deep Neural Network
7	Research On Chinese Dialect Recognition Based On Attention And Transfer Learning
8	Research And Implementation Of Mongolian-Chinese Mixed Language Speech Recognition System Based On Deep Learning
9	Research On Dialect Accent Classification Based On Deep Learning
10	Research On Unsupervised Domain Adaptation Of Mongolian-Chinese Machine Translation Model Based On Fine Tuning