Research On Mongolian Speech Recognition Acoustic Model Based On Deep Learning

Posted on:2019-04-15

Degree:Master

Type:Thesis

Country:China

Candidate:Y H Wang

Full Text:PDF

GTID:2428330563956735

Subject:Computer Science and Technology

Abstract/Summary:

As one of the core modules of the Mongolian speech recognition system,the performance of acoustic model directly affects the final recognition result of speech recognition.This thesis focused on acoustic modeling,model optimization and training efficiency in Mongolian speech recognition system,and also studied the acoustic modeling method with deep learning technology.Firstly,this thesis presents a new mixed model of TDNN-FSMN for the first time via analyzing the basic theory and model architecture of Time Delay Neural Network(TDNN)and Feed-forward Sequential Memory Network(FSMN),and applies it to the Mongolian speech recognition acoustic model.Experimental results show that the TDNN-FSMN mixed models obtain better performance than the TDNN and FSMN model.Secondly,by analyzing the TDNN-Long Short Term Memory(TDNN-LSTM)mixed model structure,we applies it to Mongolian speech recognition acoustic model.The results show the TDNN-LSTM mixed model presents greater performance for Mongolian speech recognition.In order to further improve Mongolian speech recognition performance,this thesis analyzes the Attention Model(AM)of its structure characteristics,we proposes a mixed model of AM-TDNN-LSTM,and applies it to Mongolian speech recognition acoustic model.The results prove that comparing to the TDNN-LSTM model,the AM-TDNN-LSTM mixed model resulted in remarkably low word error rate and rapid decoding rate.Finally,by studying the training way based on the Chain model whose cost function is based on the Lattice-Free Maximum Mutual Information(LF-MMI)criterion,this thesis applies it to the training of the Mongolian speech recognition acoustic model.Comparing to the traditional cost function based on the cross-entropy criterion,the decoding rate is increased by 1.3 times and the word error rate becomes relatively lower.In this thesis,by using the AM-TDNN-LSTM mixed model and the Chain model training method,the word error rate of the Mongolian speech recognition system is 7.25%,which is relatively reduced by 46.4% comparing to the Deep Neural Network(DNN)baseline,and reached a practical level as well.

Keywords/Search Tags:

Speech Recognition, Mongolian, Neural Network, Chain Model, Mixed Model, Attention Mechanism

Related items

1	Research And Implementation Of Mongolian-Chinese Mixed Language Speech Recognition System Based On Deep Learning
2	Research On Transfer Learning For Khalkha Mongolian Speech Recognition Acoustic Model
3	The Study On Acoustic Model Based Neural Netword In Mongolian Speech Recognition System
4	Research On End-to-end Speech Recognition Based On Deep Learning
5	Research On Speech Emotion Recognition Model Based On Deep Neural Network
6	Research On Mongolian Online Speech Recognition With Scarce Data Set
7	Research And Implementation Of Algorithms For Increasing Speech Recognition Ratio
8	Improved Tacotron2 Speech Synthesis Method Based On Forced Monotonic Attention Mechanism
9	Study And Improve On The Mongolian Speech Recognition System
10	Design And Implementation Of Mongolian Speech Interaction System