Font Size: a A A

Research On Speech Recognition Based On Convolutional Neural Networks

Posted on:2018-06-23Degree:MasterType:Thesis
Country:ChinaCandidate:J J MeiFull Text:PDF
GTID:2348330512995251Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the successful application of deep neural network in speech recognition,researchers have begun to explore other network structures.Convolutional neural network has been widely studied by many scholars because of its special network structure and strong ability of feature learning.At present,the potential of CNN still needs to be further explored in acoustic model building and acoustic feature extraction.In this thesis,based on basic principles of speech recognition,we mainly studied the application of convolutional neural network in speech recognition according to cut-in point for acoustic model and acoustic feature.(1)Acoustic Modeling based on Deep Convolutional Neural Network,we firstly compared and analyzed the application of CNN,DNN and GMM in acoustic modeling from the aspects of model structure and training algorithm,then demonstrated the feasibility of using CNN to describe the probability distribution of HMM state output,and emphatically studied the performance of CNN under different network depth.Using CNTK and Kaldi open source speech recognition platform to build some recognition systems based on GMM-HMM,DNN-HMM and CNN-HMM acoustic model with different depth in training set consist of 850 speaker.The experimental results show that the CNN-HMM acoustic model with two convolution layer has a relative decrease of 8.29%and 36.89%compared with DNN-HMM and GMM-HMM acoustic model in the phoneme error rate,and the CNN-HMM acoustic model of the six-convolution layer has a relative decrease of 8.13%in the phoneme error rate of the CNN-HMM acoustic model.(2)Time-frequency Spectrum feature extraction based on Deep Convolutional Neural Network,This paper analyzes the two defects of the existing acoustic feature Fbank:the design is too dependent on empirical knowledge,and there is partial loss of speech information.Based on the physical meaning of speech spectrum,a parallel time-frequency spectrum feature extraction method based on Deep CNN is proposed.The experiments show that the system based on the time-frequency spectrum features has a 2.16%reduction in the phoneme error rate compared with the Fbank system.
Keywords/Search Tags:Speech Recognition, Acoustic Model, Acoustic Feature, Deep Convolutional Neural Network, Time-frequency Spectrum
PDF Full Text Request
Related items