Font Size: a A A

Chinese Speech Recognition Based On Deep Convolution Neural Networks

Posted on:2020-09-26Degree:MasterType:Thesis
Country:ChinaCandidate:J H LiuFull Text:PDF
GTID:2428330596486062Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Language and speech are the most important and direct ways of human communication,and they have an irreplaceable role in daily life.With the development of deep learning and the continuous advancement of artificial intelligence technology,people's requirements for speech recognition are getting higher and higher,which has led to a series of research and development for speech recognition systems.Chinese speech recognition is one of the important branches..Deep Learning(DL),as the most concerned machine learning model in recent years,has achieved amazing results in many fields such as speech recognition,image processing and so on.There are not only a large number of synonyms and homonyms in Chinese,but also consonants and tones.These factors need to be taken into account in the process of speech recognition.The training is difficult and the recognition effect is not ideal.In addition,as our mother tongue,Chinese has the largest number of users,which has profound significance for its speech recognition research.At present,in the field of speech recognition,more and more acoustical models are constructed byneural networks and are studied in depth.Among them,Deep Neural Network(DNN)is the mainstream acoustic model.Too many layers of the network will destroy the speech signal characteristics and affect the recognition results.Convolution Neural Network(CNN)has a unique convolution pooling layer,which can reduce the number of parameters in the training process,better deal with a large number of Chinese data processing process,reduce the complexity of the model,and more suitable for Chinese speech recognition process.Therefore,in order to improve the accuracy of Chinese speech recognition,a Chinese speech recognition system based on deep convolution neural network acoustic model was designed and constructed by using CNN in the process of applying deep learning and neural network to Chinese speech recognition.The work of this paper includes:(1)Aiming at the phenomenon of mandatory alignment of speech in the training process of traditional acoustic models,combined with the end-to-end structure,an end-to-end convolutional neural network(CTC-CNN)acoustic model was proposed to optimize the likelihood of input and output sequences.The experimental results show that the error rate of Chinese speech recognition system based on CTC-CNN acoustic model is 23.6%.Compared with the Chinese speech recognition system based on CNN acoustic model,the accuracy is improved by about 1.2%.(2)In CTC-CNN model,CNN is a two-layer convolution structure with shallow layers.The recognition effect of shallow convolution neural networkmodel is limited.In order to further improve the accuracy,an end-to-end depth convolution neural network(CTC-DCNN)model was designed based on residual block structure.The model gradient disappearance phenomenon is improved by maxout function optimization.A new improved acoustic model of end-to-end depth convolution neural network(CTC-DCNN optimization)was proposed to improve the accuracy.Modeling ability of network.The experimental results show that compared with CNN model,this model has a4% to 4.7% reduction in word error rate in speech recognition.(3)A Chinese speech recognition system based on deep convolution neural network was designed and constructed.The CTC-DCNN optimization model and traditional CNN model,CTC-CNN model and DCNN acoustic model were compared and tested,and the experimental results were analyzed and compared.In addition,the performance of the model is further validated by different iterations,and the CTC-DCNN acoustic model in noise environment is preliminarily studied.
Keywords/Search Tags:Chinese Speech Recognition, Convolutional Neural Network, Activation Function, End-to-End Structure
PDF Full Text Request
Related items