| In the era of economic globalization,translation tasks between different languages are becoming more and more frequent.Compared with manual translation,machine translation is much faster.With the development of artificial neural network,Neural Machine Translation(NMT),as a new method of machine translation,has attracted wide attention and become the mainstream method of machine translation.However,the existing research on NMT still has the following shortcomings:(1)NMT model relies on large-scale parallel corpus for training;(2)The robustness of NMT model to texts with spelling errors and grammatical errors is not strong,that is,when the input text contains noise,the translation result of the model is not accurate.This paper has accomplished the following tasks:(1)A NMT model based on dual learning and Masked Language Model(MLM)is proposed.The dual learning based NMT model uses an encoder and two decoders(including a source language decoder and a target language decoder).When training the model,a reinforcement learning algorithm is used to effectively improve the performance of the translation model.MLM is a kind of undirected sequence model,which is capable of improving the translation quality by using context above and below.MLM also adopts the parallel decoding strategy.Compared with the traditional autoregressive model,this model can significantly improve the decoding speed of NMT model and ensure better translation quality.In this paper,we combine dual learning method with MLM model and applies it to document level translation task.This method can shorten the training time of NMT model and ensure the good translation performance of the model.(2)In order to improve the robustness of the NMT model,we describe the method of attacking NMT model with noisy text.In machine translation task,some small perturbations in the input context may cause the degradation of NMT model’s translation performance.This paper proposes to attack NMT model with noisy corpora after training NMT model with normal corpora.We can use a variety of noise generation methods to generate texts with spelling errors,grammatical errors or emoji from normal corpora.We conducted experiments at word level and subword level to compare the translation performance of different methods.The experimental results show that in word-level adversarial training,KNN algorithm and the characters changing method are capable of improve the robustness of the NMT model;in the subword-level adversarial training,using adversarial subword regularization method to attack NMT model can achieve better performance in several translation tasks. |