Research And Optimize On Encoder Decoder In End-to-End Speech Recognition

Posted on:2021-01-27

Degree:Master

Type:Thesis

Country:China

Candidate:T Zhu

Full Text:PDF

GTID:2428330614463776

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Automatic speech recognition has made it possible for computer to follow human voice commands and understand human languages.It helps different language users to communicate or improve the working environment and improve work efficiency.Due to the simple structure,fast decoding speed,and high accuracy,end-to-end speech recognition methods have attracted widespread attention.The paper will organized around end-to-end speech recognition and the main work and innovations are as follows:End-to-end model has simpler structure,and inconsistent optimization goals in the hybrid model will not occur.The problem of redundancy feature will be appear during the process of extracted features by the encoder,and that will reduce the effectiveness.In response to this problem,an encoder using a triangular structure has been proposed.In addition,to deal with the overfitting problem of recurrent neural networks,a dropout regularization method is introduced.The decoder use connection temporal classification(CTC)and Attention for training and decoding.The language knowledge of end-to-end speech recognition only comes from the text of training data which is inadequate compared with professional language models.That will affect the accuracy of performance.On the basis of the end-to-end model,the paper proposes an external language model that suitable for the end-to-end speech recognition.An independent recurrent neural network(Ind RNN)used as the basic unit of the language model,which is characterized in that the model can obtain a longer context and thus get a lower perplexity.In addition,a new regularization method,Ind Drop which the dropout is applied to the full connection between layers while maintaining the time series connection,is designed to solve the overfitting problem of Ind RNN.For the problem of heavy calculation of the softmax layer,logarithmic bilinear model is introduced to alleviate this problem.The model get 87.3 perplexity in the PTB benchmark data set.After applying the language model to the end-to-end model,the performance of speech recognition is improved.Compared with the case where the language model is not used,the performance of speech recognition is improved.3.4%.

Keywords/Search Tags:

Speech recognition, end-to-end, attention mechanism, CTC, independent recurrent neural network

PDF Full Text Request

Related items

1	Research On Speech Emotion Recognition Based On Convolutional Recurrent Neural Network
2	Research On Speech Enhancement Method Based On Parallel Optimize Recurrent Neural Network
3	Speech Emotion Recognition Based On Deep Learning
4	Research On Speech Emotion Recognition Algorithm Based On Deep Learning
5	Feature Fusion Based On Main-auxiliary Network For Speech Emotion Recognition
6	Research On Sensor Activity Recognition Based On Improved Deep Recurrent Neural Network
7	Research On Feature Fusion Method Of Speech Emotion Recognition Based On Deep Learning
8	Research And Implementation Of Speech Recognition Algorithm Based On Recurrent Neural Network
9	Hand Gesture Recognition Method Based On Recurrent Three Dimensional Convolutional Neural Network And Attention Mechanism
10	Research On Speech Emotion Recognition Based On Multi Features Fusion