Font Size: a A A

Research On English-Chinese Translation Based On Google's Neural Machine Translation

Posted on:2020-08-08Degree:MasterType:Thesis
Country:ChinaCandidate:X Q MaFull Text:PDF
GTID:2428330590977054Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years,with the re-emergence of deep learning technology,the neural machine translation model NMT has gradually replaced the traditional phrase-based statistical machine translation method.Especially the model based on Seq2 Seq fits the end-to-end language translation mode perfectly and receives the focus of industry researchers.However,compared with traditional statistical machine translation,neural machine translation model,especially the one based on large-scale data sets,still have defects and the problems of its slower training and inference speed and incomplete translation are exposed.At the same time,due to the limitation of vocabulary scale,neural machine translation also has an out-of-vocabulary problem in unregistered words and rare words.In response to the problems of incomplete translation and OOV mentioned above,we propose the following solutions.(1)In order to solve the OOV,we combine the common stemming technique with the data compression algorithm bpe(Byte pair encoding)in English text preprocessing and propose a different sequence segmentation method based on subword.We divide the English text into a sequence of subwords with this method and the Chinese text into a sequence of characters with unigram.(2)In order to prevent the decoder from being incompletely translated,we propose an improved Attention mechanism that can enhance the ability of decoder to obtain context information.Inspired by the traditional calculation process of Attention,we adopted a two-layer computing structure in the improved Attention,which focus on the the relationship of the decoder context vectors at different moments,to improve the ability of Attention to obtain the global context information of the encoder.We named this improved Attention mechanism Deep-Attention.Based on the Google's neural machine translation GNMT,this thesis analyzes the two improvement methods mentiond above on three different scale data sets.The results show that the improved word segmentation method can effectively solve the OOV problem and improve the accuracy of model translation.This method obtains an average 1.64 points improvement of BLEU.While the Deep-Attention shows a weak advantage compared with the traditional Attention,and its BLEU value is only increased by 0.3-0.6 points.
Keywords/Search Tags:Neural Machine Translation, Seq2Seq Model, LSTM, Attention Mechanism
PDF Full Text Request
Related items