Font Size: a A A

Adaptive Learning And Mutil-Scale Forward Attention-Based Automatic Speech Recognition

Posted on:2021-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:H T TangFull Text:PDF
GTID:2428330614450997Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As an effective method to convert human voice into text,Automatic Speech Recognition(ASR)has become the main technical means in many fields due to its advantages.Recently,the End-to-End deep learning methods are widely applied to ASR,among which Connectionist Temporal Classification(CTC)model and attention model with codec structure are commonly used.These two models overcome the shortcomings of forcing alignment in the traditional method,therefore they are optimized more directly and more generally.Compared with the CTC,the attention model does not need the assumption of frame independence,thus it can get better performance.Since the attention model was proposed recently,the extensive and in-depth study of this model is not enough.This thesis studies the attention model from the following two aspects:(1)Considering the network structure of a speech recognition system based on attention mechanism is relatively complex,especially when the gradient de scent algorithm is used for back propagation,the encoder updating ability is relatively weak.Therefore,to improve the encoder part,a new CTC loss is added after the encoder and combined with the attention loss to form the multi-task learning.In the multi-task learning process,the importance of CTC and attention tasks is not the same,thus it is very time-consuming and inefficient to determine the coefficients of these two tasks through manual parameter adjustment for the large-scale corpus.To solve this problem,an adaptive algorithm is introduced on the basis of multi-task learning,and a sigmoid function is used to learn CTC and attention loss,and then the different coefficients are automatically generated at every moment.It can be seen from the experiment that this adaptive algorithm can reduce the training time of the model and improve the recognition performance.(2)Since the traditional attention model may have the problem of abnormal value when calculating the attention score,the forward attention model is proposed,which adopts forward algorithm to smooth the abnormal attention value at the current moment by using the normal attention score at the previous moment.Considering the different influence degree of each frame attention score at the previous moment,the forward attention model is further optimized by adding constraints,and the constraint factors are calculated by a neural network to achieve the purpose of adaptive smoothing.Meanwhile,the traditional attention model still has the problem that the single convolutional window has insufficient modeling ability.Although the multi-head attention mechanism alleviates this problem,it adopts a single size convolutional filter and can only get the speech change mode with fixed time length.Based on the multi-head attention model,a multi-scale attention model is proposed,in which convolution filters of different sizes are used for each head to model speech primitives of different levels.Then,the forward attention model is combined with the multi-scale attention model to form the multiscale forward attention model.Experiments show that the recognition performance of this model is greatly improved compared with the baseline system.
Keywords/Search Tags:Automatic Speech Recognition, Attention Model, Multitasking Learning, Forward Algorithm, Multiscale Model
PDF Full Text Request
Related items