Font Size: a A A

Research And Application Of Neural Machine Translation Model Based On Attention Mechanism

Posted on:2022-07-05Degree:MasterType:Thesis
Country:ChinaCandidate:J G ZhangFull Text:PDF
GTID:2518306524489354Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the continuous advancement of artificial intelligence technology,existing machine models have basically reached perceptual intelligence and are moving towards cognitive intelligence.Natural language processing is the foundation of intelligent cognition and a research hotspot in the academic and industrial circles.In order to meet the society's needs for various languages and to make it more convenient for countries around the world to communicate with each other more and more frequently,low-cost machine translation research is gradually flourishing.With the continuous improvement of deep learning technology,machine translation has gradually integrated these methods and strategies,and achieved good results in multiple tasks.But there are still some shortcomings.First of all,most translation models are based on the attention mechanism to solve the problem of word alignment between bilinguals.However,the attention of normalized calculation based on softmax results in a small amount of attention distribution among irrelevant words.Therefore,how to obtain a more precise attention distribution is very important.Secondly,most neural translation models are based on the "encoder-decoder" structure.The translation of the entire model relies on the autoregressive mechanism,so each generation of the next word is based on the completed word,which leads to the low decoding efficiency of the model and the inability to obtain the global message of the translation.Finally,the word vector is the basis for the model to obtain semantic and grammatical information,so how to obtain the word vector containing more comprehensive semantic and grammatical information is very important.Based on the problems mentioned above,this thesis mainly conducts the following researches:1.Aiming at the problem of precise alignment of attention in translation,this thesis uses the sparse normalization method to replace the commonly used softmax normalization,and conducts experimental demonstrations on the neural machine translation system based on Transformer.The experimental results show that by sparsely predicting the related word with the largest weight in the word,the unnecessary weight distribution of irrelevant words is reduced,the problem of inductive bias between data is alleviated,and the accuracy and interpretability of the translation system are enhanced.2.Aiming at the problem that the decoding time of Transformer in the inference stage increases with the square of the translation length,this thesis adopts the cumulative average attention layer to alleviate this problem.In addition,in the neural machine translation model,only the preamble can be used to generate sequence information.This thesis integrates the idea of scrutinizing neural networks,and obtains the global information of related generated sentences through two decodings.The experimental results show that the translated sentences after two decodings are more coherent and the sentence meaning is more complete.3.In view of the fact that most of the current models adopt word-based embedding vector representation,a multi-characterization fusion word vector is proposed,and a method of using character-level coding vector and word-level coding vector to directly concatenate.The word vector of multi-characterization fusion can effectively solve the non-appearing words and some low-frequency words in the vocabulary,can express more complete word meaning information,and directly affect the performance of the entire translation model.It can be seen from the experimental results that the fusion methods and strategies we propose effectively improve the translation effect and quality of the overall translation model.
Keywords/Search Tags:Neural Machine Translation, Attention Mechanism, Deep Learning, Natural Language Processing, Sparsemax
PDF Full Text Request
Related items