Font Size: a A A

Research And Optimize On Encoder Decoder In End-to-End Speech Recognition

Posted on:2021-01-27Degree:MasterType:Thesis
Country:ChinaCandidate:T ZhuFull Text:PDF
GTID:2428330614463776Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Automatic speech recognition has made it possible for computer to follow human voice commands and understand human languages.It helps different language users to communicate or improve the working environment and improve work efficiency.Due to the simple structure,fast decoding speed,and high accuracy,end-to-end speech recognition methods have attracted widespread attention.The paper will organized around end-to-end speech recognition and the main work and innovations are as follows:End-to-end model has simpler structure,and inconsistent optimization goals in the hybrid model will not occur.The problem of redundancy feature will be appear during the process of extracted features by the encoder,and that will reduce the effectiveness.In response to this problem,an encoder using a triangular structure has been proposed.In addition,to deal with the overfitting problem of recurrent neural networks,a dropout regularization method is introduced.The decoder use connection temporal classification(CTC)and Attention for training and decoding.The language knowledge of end-to-end speech recognition only comes from the text of training data which is inadequate compared with professional language models.That will affect the accuracy of performance.On the basis of the end-to-end model,the paper proposes an external language model that suitable for the end-to-end speech recognition.An independent recurrent neural network(Ind RNN)used as the basic unit of the language model,which is characterized in that the model can obtain a longer context and thus get a lower perplexity.In addition,a new regularization method,Ind Drop which the dropout is applied to the full connection between layers while maintaining the time series connection,is designed to solve the overfitting problem of Ind RNN.For the problem of heavy calculation of the softmax layer,logarithmic bilinear model is introduced to alleviate this problem.The model get 87.3 perplexity in the PTB benchmark data set.After applying the language model to the end-to-end model,the performance of speech recognition is improved.Compared with the case where the language model is not used,the performance of speech recognition is improved.3.4%.
Keywords/Search Tags:Speech recognition, end-to-end, attention mechanism, CTC, independent recurrent neural network
PDF Full Text Request
Related items