Font Size: a A A

Research On Continuous Speech Recognition Based On Convolutional Neural Network

Posted on:2022-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhouFull Text:PDF
GTID:2518306554958449Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Deep learning technology has made good progress in speech recognition,replacing the traditional model matching method with the statistical probability method.In the 70 years of development of speech recognition technology,the recognition rate of English numbers from0 to 9 has been very accurate,but there is still a lot of room for progress in the field of continuous speech recognition.In recent years,with the improvement of deep learning computing ability,it is more convenient to study the current complex network structure.In this paper,convolutional neural network(CNN)is adopted to study continuous speech recognition,which has very important theoretical and practical significance.There are many kinds of acoustic models for speech recognition,among which the traditional acoustic model is GMM-HMM.GMM-HMM has a fast training speed and a small acoustic model,but it does not make full use of the context information of the frame and has a large amount of computation.When using single phoneme and tri phoneme to model,GMM-HMM has poor effect on continuous speech due to the large number of phonemes.At the same time,the input of traditional acoustic features is MFCC,which is a linear result obtained through DCT transform.Considering that the spectrum response of human auditory system to sound is not linear,and DCT transform increases the computational amount of the whole recognition process.In order to make full use of the context information of the frame and preserve more real and valuable acoustic feature information,this paper proposes to replace GMM-HMM with convolutional neural network for continuous speech research,and adds FBank acoustic feature in CNN.In the study of GMM-HMM as acoustic model,English TIMIT corpus and Chinese THCHS30 corpus were selected as data sets by configing Kaldi environment on virtual machine,and MFCC was used as acoustic feature input of GMM-HMM.The comparison was made on the single phoneme model and the tri phoneme model.The results show that,the recognition effect of the three-phoneme model is better than that of the single-phoneme model,and the recognition rate of the Chinese database THCHS30 in GMM-HMM is lower than that of the English database TIMIT.In the study of continuous speech by CNN,by configuring the relevant environment of Tensor Flow,Chinese language library THCHS30 is selected for the experiment,and parameters of CNN network structure at all levels are configured.The effects of MFCC acoustic characteristics and FBank acoustic characteristics are compared.The results show that the recognition rate of adding FBank to CNN is higher than that of MFCC,and CNN reduces parameters through weight sharing.The error rate of 9-layer CNN network structure under THCHS30 corpus is 27.73%,which is compared with 50.88% of single phoneme and35.97% of tri phoneme in GMM-HMM.Convolutional Neural Network improves the recognition rate of continuous speech.
Keywords/Search Tags:Convolutional neural network, Acoustic features, Acoustic model, Continuous speech recognition
PDF Full Text Request
Related items