| In this era that witnesses rapid development in artificial intelligence,human beings continue to break through the technical difficulties,update the human-machine interaction.Speech recognition is the basis of speech interaction,so speech recognition technology has become hotspots of scholars at home and abroad.With the rise of the neural network and the improvement of computer performance,scientists apply the neural network model to speech recognition,which has significantly improved the identification,and promoted the commercial application of speech recognition products.The main content of this thesis is the research on Chinese speech recognition based on Kaldi.This thesis expounds the basic principle of speech recognition and the method of speech signal feature extraction,and introduces the WFST decoding mechanism of open source toolkit Kaldi.This thesis focuses on the acoustic model and language model based on neural network.In terms of acoustic model,the related principles and technical algorithms of GMM-HMM are analyzed.In order to solve the problem that GMM-HMM cannot utilize the context information of the frame,DNN with stronger expression and modeling ability is introduced.At the same time,aiming at the problem that DNN cannot model the long-term correlation of speech signals,TDNN modeling method is used,and the neural network model is trained based on DT.In this thesis,thchs30 speech data set is used.With the help of Kaldi,MLE method is used to train GMM model and DNN model.At the same time,DT method is used to train DNN model and TDNN model,the word error rate is used as the evaluation standard.The experimental results obtained on the test set show that DNN has better recognition effect compared to the tri3 b model with better performance in GMM,and its word error rate decreases by 5.8%;The performance of DNN model after DT training is improved;No matter what kind of TDNN model,its performance is better than DNN model.In the same TDNN model,the tdnn_1b model what i-vector feature is added to the input feature,its word error rate is relatively low and performance is relatively good in this thesis.It can be seen that the recognition rate of neural network with stronger learning ability is higher than that of traditional acoustic model;In the acoustic model based on neural network,TDNN,which can model the long-term correlation of speech signal,performs better than DNN;Compared with the traditional MLE training method,DT method can increase the classification ability of the model and improve the performance of the system.In terms of language model,the universal N-gram language model and the smoothing algorithm are introduced.For the data sparsity problem of the N-gram language model,RNN that can be better describe the relationship between statements is used to train the language model.According to the perplexity,the results of different language models on the test set indicate that the perplexity of RNNLM is lower than that of 4-gram,indicating that RNNLM which can make full use of historical information has better performance;the RNN model with 300 neurons in the hidden layer has the lowest perplexity and the best performance in this thesis.The final decoding and recognition stage,In the baseline system based on tdnn_1b and 3-gram,using RNN for N-best rescoring,the word error rate is lower than that of the baseline system.It shows that the system performance has been improved to a certain extent. |