Research On Machine Translation Model Based On Self-Attention Mechanism

Posted on:2021-01-18

Degree:Master

Type:Thesis

Country:China

Candidate:S Fang

Full Text:PDF

GTID:2428330605450055

Subject:Electronics and Communications Engineering

Abstract/Summary:

PDF Full Text Request

Language is an important carrier of knowledge and information dissemination.With the high-speed advancement of the Internet,social informatization,and economic globalization,overcoming language barriers have become more and more important.Therefore,machine translation is of a great practical significance to break through the language barrier among different countries,regions and nations,to promote communication among different nationalities and to reduce the pressure of people learning foreign languages.This thesis first briefly introduce conventional statistical machine translation(SMT)and neural machine translation(NMT)and fully research those machine translation models from their advantages and disadvantages.And based on this,this thesis specifically introduces the neural machine translation model Transformer based on the self-attention mechanism and multi-heads self-attention mechanism,and finds some shortcomings of the self-attention mechanism through specific experiments.For these shortcomings,this thesis puts forward corresponding plans to improve them,the specific content is as follow:First,via the detailed theoretical analysis and experimental analysis of the self-attention mechanism and the Transformer model,finding that there are two problems:one is that the attention heads of multi-heads self-attention networks in the Transformer model are doing representation-learning independently,which will cause the bottleneck of the performance of the Transformer model to some degree.The second is that the Transformer model can not capture local information well because the operation that self-attention mechanism pays attention to all input signals will disperse the attention distribution.Then,aiming at the first problem mentioned above,this thesis proposes an interactive multi-heads self-attention mechanism networks,which connects all the attention heads in the multi-heads self-attention networks through a linear projection,which can make attention heads share their learned information.So that the Transformer model can do representation-learning more fully,and break its performance bottleneck.To solve the second problem,this thesis proposes a learnable gaussian bias as local modeling,and add it to original self-attention networks(SANS),so that the improved SANs can effectively pay attention to local information.Finally,for the above two improved schemes,this thesis prove their effectiveness by specific experiments(use BLEU as the evaluation standard).And in the process of validating the second scheme,finding that although the local modeling can improve the performance of the Transformer model,it can not be integrated into the self-attention networks to some extent.For this problem,this thesis proposes a gated local modeling,which can let local modeling be integrated into self-attention networks via gating mechanism,and proving its effectiveness of this scheme through specific experiments.

Keywords/Search Tags:

Neural Machine Translation, Self-Attention, Interactive Self-Attention, Local Modeling

PDF Full Text Request

Related items

1	Research On The Improvement Of Neural Machine Translation With Recurrent Attention Modeling
2	Research And Application Of Neural Machine Translation Model Based On Attention Mechanism
3	Multi-subspace Attention Neural Machine Translation
4	Research On Neural Machine Translation Based On Attention Convolution
5	New Machine Translation Models Based On Improved Self-attention Mechanism
6	Research On Machine Translation Method Based On Deep Neural Network
7	Research On English-Chinese Translation Based On Google's Neural Machine Translation
8	Research On Neural Machine Translation Combining Lexicology And Syntax
9	Research On Neural Machine Translation Based On Re-decoding
10	Research On Attention-based Mongolian-chinese Neural Network Machine Translation System