Font Size: a A A

Research On Chinese Speech Recognition Based On Kaldi

Posted on:2021-05-10Degree:MasterType:Thesis
Country:ChinaCandidate:W F ZhangFull Text:PDF
GTID:2428330605461398Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of artificial intelligence technology,speech is not only a means of communication between human beings,but also an important bridge for human-computer interaction.In recent years,the rapid development of speech recognition technology has been gradually applied to various fields.How to improve the recognition rate of speech recognition system has become a problem studied by many scholars at home and abroad.With the rise of deep neural networks,traditional acoustic models are gradually replaced,and the recognition rate of acoustic models based on neural networks is significantly improved.The main content of this paper is Chinese speech recognition based on HMM,which explains in detail the principle of speech recognition technology.This paper introduces the extraction process of MFCC features and FBank features,and focuses on the acoustic model in speech recognition,including the traditional acoustic model GMM and the mainstream neural network model DNN,and make an in-depth analysis and comparison of the two models.At the same time,the shortcomings of DNN model are put forward.In view of DNN's inability to model the long-term correlation of speech signals,a new method of TDNN modeling using time-delay neural network is proposed.For the decoder in speech recognition,this paper introduces the construction method of static decoding network based on WFST in Kaldi tool.Finally,this paper uses the Kaldi open source toolkit,an open source voice recognition tool written in C++that supports neural networks and most mainstream algorithms.The experimental data in this paper is 1000 hours of aishell-2 Chinese speech data set,and the GMM model,DNN model and TDNN model are trained respectively with Kaldi tool.The word error rate(WER)is used as the evaluation standard for the performance of the model.The experiment shows that DNN model has better recognition effect than GMM model in continuous speech recognition.Even the Tri3 model with the best performance in GMM,DNN still decreases 8.52%in word error rate.Compared with the DNN model,the TDNN model also has better performance.In the same TDNN model,after adding pitch feature and i-vector feature to the input feature,the word error rate still decreased by 1.12%and the recognition rate reached 91.24%on the test set.In summary,among the acoustic models based on neural network,the TDNN model which can model the long-term correlation of speech signals is better than the DNN model.For feature input,you can choose to add more effective acoustic features to improve the performance of speech recognition system.
Keywords/Search Tags:Speech recognition, Acoustic model, Neural network, Kaldi, TDNN
PDF Full Text Request
Related items