Application Of Convolutional Neural Network In Large Vocabulary Continuous Speech Recognition

Posted on:2019-05-10

Degree:Master

Type:Thesis

Country:China

Candidate:W S Ke

Full Text:PDF

GTID:2428330563991560

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

After many years of development,speech recognition technology has made great progress.It has achieved great success in the recognition of isolated words,and it has reached a completely correct level in the recognition rate.However,there is still room for improvement in large vocabulary continuous speech recognition(LVCSR).In recent years,deep learning has been widely used in the field of large vocabulary continuous speech recognition.This paper studies the application of convolutional neural network(CNN)in large vocabulary continuous speech recognition.The topic selection has important theoretical and practical significance.This article first describes the research background and status quo of speech recognition technology,and discusses related knowledge of speech recognition and artificial neural network,including the basic principles of speech recognition,the composition of speech recognition system,and the BP algorithm and training algorithm of the convolutional neural network.Secondly,it describes the difficulties in large vocabulary continuous speech recognition,analyzes the advantages of convolutional neural networks in LVCSR,and builds a large vocabulary continuous speech recognition system.It also focuses on the CNN network structure in LVCSR and analyzes the characteristics of each layer of the network and illustrates the design method of each layer of parameters.Finally,the Chinese speech library TIMIT and the English speech library thchs30 are used to test the convolutional neural network in LVCSR.Compared FBANK speech features with widely used MFCC speech features,FBANK speech features have a lower word error rate in the CNN model.By optimizing the convolution kernel convolution kernel size and pooling layer pool area size,the convolutional neural network structure is optimized and a network model with a relatively low word error rate is obtained..The word error rate after optimizing the English database TIMIT was 19.1%,compared with the word error rate of 32.7% for the monophone model of the GMM-HMM method and 25.6% for the triphone model,the CNN model improved the recognition accuracy;The word error rate of the Chinese speech database thchs30 optimization is 27.34%.Compared with the 50.88% of the monophone model of the GMM-HMM method and the 35.97% of the triphone model,the CNN model also improves the recognition accuracy.The online recognition method was also used in the experiment to show the optimized CNN recognition and performance improvement in the LVCSR.The research shows that convolutional neural network can reduce the word error rate of speech recognition in LVCSR,but there is still a lot of research work to do,such as learning in combination with big data to better improve the performance of speech recognition.

Keywords/Search Tags:

Large vocabulary continuous speech recognition(LVCSR), Convolutional neural network(CNN), Acoustic feature, Acoustic model

PDF Full Text Request

Related items

1	Deep Neural Network Based Acoustic Feature Extraction For LVCSR Systems
2	Discriminative Training For Large Vocabulary Continuous Speech Recognition
3	Research On Continuous Speech Recognition Based On Convolutional Neural Network
4	Research On Speech Recognition Based On Convolutional Neural Networks
5	Real-time speaker -independent large vocabulary continuous speech recognition
6	A Study Of An Irrelevant Variability Normalization Based Large Vocabulary Continuous Speech Recognition
7	A Generic, Scalable Architecture for a Large Acoustic Model and Large Vocabulary Speech Recognition Accelerator
8	Research On Discriminative Techniques Of Feature Extraction And Acoustic Model Training In Continuous Speech Recognition
9	Acoustic Modeling For Continuous Speech Recognition
10	The Study On Acoustic Model Based Neural Netword In Mongolian Speech Recognition System