Research On Speech Recognition Based On Deep Neural Network

Posted on:2019-11-09

Degree:Master

Type:Thesis

Country:China

Candidate:J Wang

Full Text:PDF

GTID:2428330545454444

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

As the most convenient way for people to communicate with each other,speech recognition has always been a hot topic.Especially after deep learning has become popular,speech recognition using neural networks has become a standard in the academic and industrial worlds.It is also under the impetus of deep learning that speech recognition has shown great practicability in smart homes,input methods,translators,and voice control.Therefore,it becomes very necessary to be able to design a speech recognition system.This thesis focuses on the deep neural network for related research on speech recognition system.In the acoustic model part,kaldi is used as a training tool to extract 40-dimensional MFCC features for baseline model training.Firstly,the monophone model is trained and then the triphone model is trained through the decision tree state binding.Through the recognition results,it is verified that the triphone structure is better than that of the monophone structure and the improvement effect is about 14%;in order to reduce the impact of different speakers on the recognition result,the features are subsequently processed,such as linear discriminant analysis,speaker adaptation,etc.The final recognition effect was improved by approximately 8.4%.Based on the baseline model,a deep neural network was trained based on the state alignment information to provide a posterior probability for the hidden Markov model.The recognition results verify that the DNN-HMM-based acoustic modeling method is superior to the traditional GMMHMM method.Finally,the same network model is trained by two training sets with different data volumes.The recognition result of the training set is larger by 1.1% than that of the training set.In the language model part,firstly,using the SRILM language model training tool to analyze the computational process of the n-gram score of the statistical language model,then trained two branch models,and obtained a language model by interpolation,and finally analyzed the branch model and the result of the recognition.The pros and cons of a common model.By comparison,it is found that for a test set that is biased towards a certain branch of the language model,the uninterpolated effect is better than the interpolation effect.

Keywords/Search Tags:

Speech recognition, Deep neural network, Acoustic modeling, DNNHMM, Language modeling, N-gram

PDF Full Text Request

Related items

1	Structured Recurrent Neural Network And Its Applications In Automatic Speech Recognition
2	Research On Speech Recognition Method Based On Deep Learning
3	Multi-modal Speech Recognition Based On Deep Neural Network
4	Research On BN Feature Based Acoustic Modeling And Its Application In Keyword Retrieval
5	Research On Acoustic Modeling In Low Resource Speech Recognition Based On Transfer Learning
6	Research And Application Of Speech Recognition Technology Based On Deep Neural Network
7	Research On Uyghur Speech Recognition Based On Deep Learning
8	Amdo Tibetan Speech Recognition Based On Deep Neural Network
9	Research On Acoustic Modeling For Speech Recognition Based On Deep Neural Networks
10	Research On Acoustic Modeling Of Speech Recognition Based On Recurrent Neural Network