Adaptive Learning And Mutil-Scale Forward Attention-Based Automatic Speech Recognition

Posted on:2021-04-09

Degree:Master

Type:Thesis

Country:China

Candidate:H T Tang

Full Text:PDF

GTID:2428330614450997

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

As an effective method to convert human voice into text,Automatic Speech Recognition(ASR)has become the main technical means in many fields due to its advantages.Recently,the End-to-End deep learning methods are widely applied to ASR,among which Connectionist Temporal Classification(CTC)model and attention model with codec structure are commonly used.These two models overcome the shortcomings of forcing alignment in the traditional method,therefore they are optimized more directly and more generally.Compared with the CTC,the attention model does not need the assumption of frame independence,thus it can get better performance.Since the attention model was proposed recently,the extensive and in-depth study of this model is not enough.This thesis studies the attention model from the following two aspects:(1)Considering the network structure of a speech recognition system based on attention mechanism is relatively complex,especially when the gradient de scent algorithm is used for back propagation,the encoder updating ability is relatively weak.Therefore,to improve the encoder part,a new CTC loss is added after the encoder and combined with the attention loss to form the multi-task learning.In the multi-task learning process,the importance of CTC and attention tasks is not the same,thus it is very time-consuming and inefficient to determine the coefficients of these two tasks through manual parameter adjustment for the large-scale corpus.To solve this problem,an adaptive algorithm is introduced on the basis of multi-task learning,and a sigmoid function is used to learn CTC and attention loss,and then the different coefficients are automatically generated at every moment.It can be seen from the experiment that this adaptive algorithm can reduce the training time of the model and improve the recognition performance.(2)Since the traditional attention model may have the problem of abnormal value when calculating the attention score,the forward attention model is proposed,which adopts forward algorithm to smooth the abnormal attention value at the current moment by using the normal attention score at the previous moment.Considering the different influence degree of each frame attention score at the previous moment,the forward attention model is further optimized by adding constraints,and the constraint factors are calculated by a neural network to achieve the purpose of adaptive smoothing.Meanwhile,the traditional attention model still has the problem that the single convolutional window has insufficient modeling ability.Although the multi-head attention mechanism alleviates this problem,it adopts a single size convolutional filter and can only get the speech change mode with fixed time length.Based on the multi-head attention model,a multi-scale attention model is proposed,in which convolution filters of different sizes are used for each head to model speech primitives of different levels.Then,the forward attention model is combined with the multi-scale attention model to form the multiscale forward attention model.Experiments show that the recognition performance of this model is greatly improved compared with the baseline system.

Keywords/Search Tags:

Automatic Speech Recognition, Attention Model, Multitasking Learning, Forward Algorithm, Multiscale Model

PDF Full Text Request

Related items

1	Research On Several Modeling Problems In Deep Learning Speech Recognition Systems
2	Research On Mongolian Speech Recognition Acoustic Model Based On Deep Learning
3	Research On Speech Spoofing Detection Based On Attention Mechanism And End-to-End Model
4	Research And Application Of Attention-based Mandarin Speech Recognition
5	Research On Adaptation Methods In Deep Learning Based Speech Recognition Systems
6	Research On End-to-end Speech Recognition Based On Deep Learning
7	Design And Implementation Of Intelligent Speech Interaction
8	Research On Toxicity Detection Of Internet Speech Based On Deep Learning
9	Research On End-to-End Speech Recognition Method Based On Self-Attention Mechanism
10	Speech Emotion Recognition Research Based On Deep Learning