Font Size: a A A

Research And Implementation Of Speech Recognition Algorithm Based On Recurrent Neural Network

Posted on:2021-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:J R DongFull Text:PDF
GTID:2428330611967583Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development and popularization of artificial intelligence,speech recognition has brought more and more convenience to people.Many kinds of intelligent audio,voice assistant,voice input method and other applications can be seen everywhere.People's requirements for speech recognition have gradually improved,mainly reflected in the accuracy and efficiency.The traditional speech recognition method is based on the hidden Markov model,but with the rapid increase of data volume,the processing efficiency based on the hidden Markov model is more and more unable to meet people's needs,people began to apply the deep learning method in the field of speech recognition.Many models only consider the current state judgment when processing the speech signal,ignoring the influence of context relevance on the current output judgment,and consume too much time in processing the input-output alignment,resulting in the recognition accuracy and efficiency is not high enough.In this paper,we think that giving "memory" weight to the relevant information of context will help to judge the current output,and more accurately convert the temporal information of speech into the corresponding correct characters.In view of the problems existing in the field of speech recognition,this paper proposes a hybrid GRU-CTC model which integrates leakyrelu function.There are two main characteristics of the model.Firstly,a variant structure of Gated Recurrent Unit(GRU)based on recurrent neural network is introduced.The context relevance is fully considered through the double gating mechanism,and selective memory is carried out through weight assignment.On this basis,a novel LeakyReLU function is incorporated to improve the training convergence efficiency of the model.Secondly,Connectionist temporal classification(CTC)is introduced,which can directly process the whole sequence input without prior alignment of voice data and text data.It can solve the time-consuming problem of general model in the process of processing input-output alignment and improve the speed of model training.In order to solve above problems,this paper compares two groups of experiments.The first group of experiments is to compare GRU-CTC with the other two groups of training models.The experimental results show that the character error rate(CER)of GRU-CTC hybrid model is the lowest among the three groups of experimental models.Compared with the sub optimal LSTM-CTC,the CER value is reduced by 1.03%,and the accuracy is improved to a certain extent.In the second group,gru-ctc with different activation functions is compared and analyzed under different language models.The experimental results show that the CER of Leaky GRU-CTC under Tri-gram language model is 1.32% lower than that of the suboptimal model,and the training convergence speed is faster and the recognition accuracy is higher.
Keywords/Search Tags:Deep learning, Speech Recognition, Recurrent neural network, Gated Recurrent Unit, Connectionist temporal classification, LeakyReLU
PDF Full Text Request
Related items