Font Size: a A A

Research On Speech Recognition Based On Deep Neural Network

Posted on:2019-11-09Degree:MasterType:Thesis
Country:ChinaCandidate:J WangFull Text:PDF
GTID:2428330545454444Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As the most convenient way for people to communicate with each other,speech recognition has always been a hot topic.Especially after deep learning has become popular,speech recognition using neural networks has become a standard in the academic and industrial worlds.It is also under the impetus of deep learning that speech recognition has shown great practicability in smart homes,input methods,translators,and voice control.Therefore,it becomes very necessary to be able to design a speech recognition system.This thesis focuses on the deep neural network for related research on speech recognition system.In the acoustic model part,kaldi is used as a training tool to extract 40-dimensional MFCC features for baseline model training.Firstly,the monophone model is trained and then the triphone model is trained through the decision tree state binding.Through the recognition results,it is verified that the triphone structure is better than that of the monophone structure and the improvement effect is about 14%;in order to reduce the impact of different speakers on the recognition result,the features are subsequently processed,such as linear discriminant analysis,speaker adaptation,etc.The final recognition effect was improved by approximately 8.4%.Based on the baseline model,a deep neural network was trained based on the state alignment information to provide a posterior probability for the hidden Markov model.The recognition results verify that the DNN-HMM-based acoustic modeling method is superior to the traditional GMMHMM method.Finally,the same network model is trained by two training sets with different data volumes.The recognition result of the training set is larger by 1.1% than that of the training set.In the language model part,firstly,using the SRILM language model training tool to analyze the computational process of the n-gram score of the statistical language model,then trained two branch models,and obtained a language model by interpolation,and finally analyzed the branch model and the result of the recognition.The pros and cons of a common model.By comparison,it is found that for a test set that is biased towards a certain branch of the language model,the uninterpolated effect is better than the interpolation effect.
Keywords/Search Tags:Speech recognition, Deep neural network, Acoustic modeling, DNNHMM, Language modeling, N-gram
PDF Full Text Request
Related items