Font Size: a A A

Research On Mongolian Speech Recognition Acoustic Model Based On Deep Learning

Posted on:2019-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y H WangFull Text:PDF
GTID:2428330563956735Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As one of the core modules of the Mongolian speech recognition system,the performance of acoustic model directly affects the final recognition result of speech recognition.This thesis focused on acoustic modeling,model optimization and training efficiency in Mongolian speech recognition system,and also studied the acoustic modeling method with deep learning technology.Firstly,this thesis presents a new mixed model of TDNN-FSMN for the first time via analyzing the basic theory and model architecture of Time Delay Neural Network(TDNN)and Feed-forward Sequential Memory Network(FSMN),and applies it to the Mongolian speech recognition acoustic model.Experimental results show that the TDNN-FSMN mixed models obtain better performance than the TDNN and FSMN model.Secondly,by analyzing the TDNN-Long Short Term Memory(TDNN-LSTM)mixed model structure,we applies it to Mongolian speech recognition acoustic model.The results show the TDNN-LSTM mixed model presents greater performance for Mongolian speech recognition.In order to further improve Mongolian speech recognition performance,this thesis analyzes the Attention Model(AM)of its structure characteristics,we proposes a mixed model of AM-TDNN-LSTM,and applies it to Mongolian speech recognition acoustic model.The results prove that comparing to the TDNN-LSTM model,the AM-TDNN-LSTM mixed model resulted in remarkably low word error rate and rapid decoding rate.Finally,by studying the training way based on the Chain model whose cost function is based on the Lattice-Free Maximum Mutual Information(LF-MMI)criterion,this thesis applies it to the training of the Mongolian speech recognition acoustic model.Comparing to the traditional cost function based on the cross-entropy criterion,the decoding rate is increased by 1.3 times and the word error rate becomes relatively lower.In this thesis,by using the AM-TDNN-LSTM mixed model and the Chain model training method,the word error rate of the Mongolian speech recognition system is 7.25%,which is relatively reduced by 46.4% comparing to the Deep Neural Network(DNN)baseline,and reached a practical level as well.
Keywords/Search Tags:Speech Recognition, Mongolian, Neural Network, Chain Model, Mixed Model, Attention Mechanism
PDF Full Text Request
Related items