Research On Speech Recognition Based On Convolutional Neural Networks

Posted on:2018-06-23

Degree:Master

Type:Thesis

Country:China

Candidate:J J Mei

Full Text:PDF

GTID:2348330512995251

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

In recent years,with the successful application of deep neural network in speech recognition,researchers have begun to explore other network structures.Convolutional neural network has been widely studied by many scholars because of its special network structure and strong ability of feature learning.At present,the potential of CNN still needs to be further explored in acoustic model building and acoustic feature extraction.In this thesis,based on basic principles of speech recognition,we mainly studied the application of convolutional neural network in speech recognition according to cut-in point for acoustic model and acoustic feature.(1)Acoustic Modeling based on Deep Convolutional Neural Network,we firstly compared and analyzed the application of CNN,DNN and GMM in acoustic modeling from the aspects of model structure and training algorithm,then demonstrated the feasibility of using CNN to describe the probability distribution of HMM state output,and emphatically studied the performance of CNN under different network depth.Using CNTK and Kaldi open source speech recognition platform to build some recognition systems based on GMM-HMM,DNN-HMM and CNN-HMM acoustic model with different depth in training set consist of 850 speaker.The experimental results show that the CNN-HMM acoustic model with two convolution layer has a relative decrease of 8.29%and 36.89%compared with DNN-HMM and GMM-HMM acoustic model in the phoneme error rate,and the CNN-HMM acoustic model of the six-convolution layer has a relative decrease of 8.13%in the phoneme error rate of the CNN-HMM acoustic model.(2)Time-frequency Spectrum feature extraction based on Deep Convolutional Neural Network,This paper analyzes the two defects of the existing acoustic feature Fbank:the design is too dependent on empirical knowledge,and there is partial loss of speech information.Based on the physical meaning of speech spectrum,a parallel time-frequency spectrum feature extraction method based on Deep CNN is proposed.The experiments show that the system based on the time-frequency spectrum features has a 2.16%reduction in the phoneme error rate compared with the Fbank system.

Keywords/Search Tags:

Speech Recognition, Acoustic Model, Acoustic Feature, Deep Convolutional Neural Network, Time-frequency Spectrum

PDF Full Text Request

Related items

1	Construction And Experiment Of Acoustic Model Based On CNN
2	Uyghur Speech Recognition Based On Deep Recurrent Neural Network
3	Research On Mandarin Speech Recognition Technology Based On Deep Neural Network
4	Research On Time-frequency Texture Characterization And Recognition Of Acoustic Signals
5	Research On Speech Recognition Method Based On Deep Learning
6	Application Of Convolutional Neural Network In Large Vocabulary Continuous Speech Recognition
7	Research For Continuous Speech Recognition Based On Deep Neural Networks
8	Deep Neural Networb For Chinese Speech Recognition
9	Research On Continuous Speech Recognition Based On Convolutional Neural Network
10	The Study On Acoustic Model Based Neural Netword In Mongolian Speech Recognition System