Font Size: a A A

Research On Machine Translation Model Based On Self-Attention Mechanism

Posted on:2021-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:S FangFull Text:PDF
GTID:2428330605450055Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Language is an important carrier of knowledge and information dissemination.With the high-speed advancement of the Internet,social informatization,and economic globalization,overcoming language barriers have become more and more important.Therefore,machine translation is of a great practical significance to break through the language barrier among different countries,regions and nations,to promote communication among different nationalities and to reduce the pressure of people learning foreign languages.This thesis first briefly introduce conventional statistical machine translation(SMT)and neural machine translation(NMT)and fully research those machine translation models from their advantages and disadvantages.And based on this,this thesis specifically introduces the neural machine translation model Transformer based on the self-attention mechanism and multi-heads self-attention mechanism,and finds some shortcomings of the self-attention mechanism through specific experiments.For these shortcomings,this thesis puts forward corresponding plans to improve them,the specific content is as follow:First,via the detailed theoretical analysis and experimental analysis of the self-attention mechanism and the Transformer model,finding that there are two problems:one is that the attention heads of multi-heads self-attention networks in the Transformer model are doing representation-learning independently,which will cause the bottleneck of the performance of the Transformer model to some degree.The second is that the Transformer model can not capture local information well because the operation that self-attention mechanism pays attention to all input signals will disperse the attention distribution.Then,aiming at the first problem mentioned above,this thesis proposes an interactive multi-heads self-attention mechanism networks,which connects all the attention heads in the multi-heads self-attention networks through a linear projection,which can make attention heads share their learned information.So that the Transformer model can do representation-learning more fully,and break its performance bottleneck.To solve the second problem,this thesis proposes a learnable gaussian bias as local modeling,and add it to original self-attention networks(SANS),so that the improved SANs can effectively pay attention to local information.Finally,for the above two improved schemes,this thesis prove their effectiveness by specific experiments(use BLEU as the evaluation standard).And in the process of validating the second scheme,finding that although the local modeling can improve the performance of the Transformer model,it can not be integrated into the self-attention networks to some extent.For this problem,this thesis proposes a gated local modeling,which can let local modeling be integrated into self-attention networks via gating mechanism,and proving its effectiveness of this scheme through specific experiments.
Keywords/Search Tags:Neural Machine Translation, Self-Attention, Interactive Self-Attention, Local Modeling
PDF Full Text Request
Related items