Font Size: a A A

Research On Large Vocabulary Continuous Speech Recognition Based On Deep Learning

Posted on:2019-12-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2428330590465876Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
The main purpose of automatic speech recognition is to allow the machine to“understand” what people are talking about,and convert the speech signal into text information.People want to achieve fast,barrier-free communication with machines.In recent years,with the extensive application of deep learning,the speech recognition architecture of DNN-HMM has become the mainstream architecture of the large vocabulary continuous speech recognition system,which replaced the traditional speech recognition architecture of GMM-HMM.This paper is based on deep learning and carries out in-depth research from the aspects of feature extraction and acoustic model.It has a high theoretical significance and research value.Firstly,the research status of speech recognition technology at home and abroad is expounded.The theory foundation of deep learning and key technologies of speech recognition are introduced.The overall scheme of large vocabulary continuous speech recognition system based on deep learning is designed.The deficiencies of the original acoustic features extraction and the DNN-HMM acoustic model are analyzed emphatically.It is clear that the key technologies studied in this paper are speech features extraction and acoustic model optimization.Secondly,the problem of low recognition rate caused by common speech features such as MFCC,Fbank,bottleneck features can not fully extract the before and after frame information from the speech,an improved speech bottleneck features extraction method based on overlapping group lasso sparse deep neural networks is proposed.This method uses the overlapping group lasso algorithm to improve the DNN,and extracts the speech bottleneck features with the speech correlation information from the MFCC acoustic features.Experimental results show that the speech recognition rate which is obtained by DNN is significantly improved.Then,in order to solve the problems of vanishing gradient and overfitting in DBLSTM,the Maxout neuron and Dropout regularization algorithm are used to improve the DBLSTM-HMM acoustic model.In order to adapt to the bidirectional dependence of DBLSTM on speech information at each time step,the CSC-BPTT training algorithm is further proposed to train the DBLSTM neural network.Experimental results show that the improved DBLSTM-HMM acoustic model is superior to DNN-HMM,RNN-HMM andother typical acoustic models,and the speech recognition performance is improved greatly.Finally,a large vocabulary continuous speech recognition system based on DBLSTM-HMM is constructed by the improved speech features extraction method and acoustic model.Experimental verification and analysis are performed in THCHS-30 Chinese corpus and self-made corpus.Experimental results show that compared with the traditional DNN-HMM baseline speech recognition system,the WER of the speech recognition system established in this paper is lower 7.44%,the system generalization ability is stronger,and obtain a higher speech recognition rate.
Keywords/Search Tags:large vocabulary continuous speech recognition, deep learning, speech bottleneck feature, DBLSTM
PDF Full Text Request
Related items