Research On End-to-End Speech Recognition Method Based On Self-Attention Mechanism

Posted on:2021-06-30

Degree:Master

Type:Thesis

Country:China

Candidate:Z C Lei

Full Text:PDF

GTID:2518306497957579

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of computer and artificial intelligence technology,Automatic Speech Recognition(ASR)becomes a key method for human-computer interaction and has been widely used in many practical applications such as smart home,smart wearables and intelligent dialogue system.The end-to-end speech recognition model is simple in structure,flexible in modeling,and requires less memory in the decoding process.It can achieve better recognition results than traditional hybrid models in many application scenarios,and it has become a popular research direction in the field of ASR.In recent years,the Transformer model which only uses Self-attention to model sequences shows a strong capability in sequence modeling and has achieved good results in many natural language processing tasks.Accoding to the research on the end-to-end speech recognition model based on the Self-attention mechanism,this paper proposes an improved end-to-end speech recognition model based on the Self-attention mechanism for the problem that the Self-attention mechanism is difficult to learn the positional relationship and alignment of the speech sequence and the attention weight may be diluted during the speech modeling process.Main research contents are summarized as follows:(1)A SAC model combines Self-attention with Convolutional Neural Network(CNN)is proposed.In order to solve the problem that the Self-attention mechanism cannot model the positional relationship between the sequence,the convolutional neural network is used to replace the sine and cosine positional encoding structure in the original Transformer model to automatically learn the positional relationship of the sequence,and further study the training and decoding tricks of the SAC model.It is verified by experiments that the SAC model can obtain better speech recognition effect than the Transformer model.(2)A hybrid model of CTC/SAC based on Connectionist Temporal Classification(CTC)and SAC is proposed.The SAC model has difficulty in learning the alignment relationship between the speech feature sequence and the output sequence using the Self-attention mechanism,and the CTC can easily model the alignment of the sequence through the Markov assumption and the forward and backward algorithm.Therefore,the multi-task learning technology is used to construct the CTC/SAC hybrid model to combine the modeling advantages of CTC and SAC models,and further realize the joint training and decoding of the CTC/SAC hybrid model.The test results show that the CTC/SAC model has improved convergence speed and recognition accuracy compared to the SAC model.(3)Research on the optimization of CTC/SAC mixed model.The externally trained language model is added to the CTC/SAC hybrid model through shallow fusion to improve the modeling ability of the hybrid model.At the same time,considering that an output unit in speech recognition is mainly related to a few adjacent speech frames,and the attention calculation will be interfered by noise,a way to add Self-attention weight bias terms to optimize the CTC/SAC hybrid model is proposed,to suppress the degree of attention outside the area of interest,and further improve the recognition accuracy of the hybrid model.

Keywords/Search Tags:

Automatic Speech Recognition, end-to-end, Self-attention mechanism, Connectionist Temporal Classification, multi-task learning

PDF Full Text Request

Related items

1	Study On Attention Based Speech Emotion Recognition
2	Research On CTC-based And Attention-based End-to-end Speech Recognition
3	Research On Speech Emotion Recognition Algorithm Based On Deep Learning
4	Research On End-to-End Speech Recognition Based On GRU And Self-Attention Mechanism
5	End-to-end Multi-accent Mandarin Speech Recognition
6	Research On Connectionist Temporal Classification In Speech Recognition
7	Research On Speech Keyword Spotting Technology Based On Deep Learning
8	Research On Detection And Recognization Of Network Offensive Speech Based On Multi-task Learning
9	Research And Application Of Deep Learning Based Continuous Speech Recognition
10	Design Of End-to-end Ando Tibetan Speech Recognition System Based On Deep Learning