Font Size: a A A

Research On Algorithms Of Speech Sentence Recognition Based On Deep Learning

Posted on:2020-12-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y P SuFull Text:PDF
GTID:2428330602950434Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Speech is one of the most important ways of human-computer interaction.Speech recognition technology enables computer to process natural language by converting the input speech into understandable data to computers.In recent years,due to the rapid development of portable devices and artificial intelligence,the application fields of speech recognition technology have become more extensive.The traditional speech recognition technology has reached the bottleneck of recognition rate after several decades of optimization and improvement.At present,the deep learning is becoming more and more mature,many neural networks with excellent modeling capabilities have been developed,and have good application in various fields.Therefore,applying deep learning technology to speech recognition will improve the performance of current speech recognition.Besides,with the increase of human-computer interaction requirements,the application scene will transform from the simple command word recognition to continuous speech recognition,so the research on single sentence speech recognition is necessary.The traditional speech recognition technology mainly relied on the GMM-HMM structure.The GMM-HMM system builds a model of the temporal feature information of the speech through the HMM,and implements a classification fit based on the state in the HMM based on the GMM to obtain an approximate simulation of the speech.However,due to the limitations of GMM-HMM,the performance of traditional speech recognition technology is further limited.In the field of deep learning,neural network technology has powerful nonlinear modeling capabilities to compensate for defects in the GMM-HMM model.Therefore,this thesis utilizes a variety of neural network structures in speech recognition systems.This thesis analyzes and implements a traditional speech recognition system based on GMM-HMM and uses it as a baseline system for comparison.The ability of GMM to model nonlinear datasets is flawed.Therefore,the DNN structure in deep learning is introduced to replace the GMM part in the traditional system to classify the HMM state.Besides,DBN pre-training technology is used to optimize the performance of DNN-HMM system.The speech acoustic model is improved to improve the recognition accuracy.DAE technology is applied to data preprocessing to reduce noise interference.Since the network structure of DNN is sufficiently complex and the nonlinear activation function is used in DNN,the performance of single-sentence speech recognition of DNN system is better than that of GMM system.The experimental results in this thesis further prove this conclusion.This thesis further explores the possibility of applying other deep learning techniques to single-sentence speech recognition.Firstly,based on CNN technology,a small vocabulary speech recognition system is realized.The speech signal is converted into a two-dimensional time-frequency map and transmitted to the input layer of the network.After the CNN convolution layer and the pooling layer are processed,a new one is generated.The feature set finally uses the softmax function to classify these feature maps through the fully connected layer to achieve a better recognition effect under small vocabulary.The experimental results show that CNN can extract and directly classify speech signals with close lengths.However,CNN cannot extract the timing feature information of speech,which needs to be compensated by combining with other neural networks.Therefore,the research on the single sentence speech recognition system based on LSTM technology is carried out.By introducing the CTC loss function,the LSTM network can align the speech and the recognized text in time series,so that LSTM can replace the HMM in the traditional speech recognition technology.Combine LSTM and CNN,use CNN to extract speech features,and use LSTM for timing alignment.The experimental results show that CNN-LSTM performs better than single-sentence speech recognition than GMM-HMM system.
Keywords/Search Tags:Speech Recognition, Deep Learning, Deep Neural Network, Convolutional Neural Networks, Long Short-Term Memory
PDF Full Text Request
Related items