| In recent years,neural machine translation(NMT)has been developing rapidly.With the development of application requirements and the popularity of application scenarios,document-level NMT has gradually received more and more attention.How to effectively model documents for document-level NMT and extract useful information from documentlevel context becomes a hot research topic in NMT.In this paper,we focus on the study of better utilization of document-level contexts.The research makes contribution from the following three aspects:(1)Hierarchical global context-enhanced NMT.In this paper,we propose to improve the performance of NMT by properly using global context.First,we obtain sentence-level vectors by linearly combining word-level hidden states.Then,we model the dependency from both the word-level and sentence-level,i.e.,between each word in current sentence and each sentence in a document,and between two sentences.Finally,the global context extracted from both the word-level and the sentence-level is properly incorporated into translation model.In this way,each word in current sentence is equipped with its own unique global context.Experimental results on various document-level translation tasks show that the proposed approach significantly improves translation performance.(2)Pre-training the context extractor for context-aware NMT.Due to limited scale of document-level parallel corpus,it is hard to train a well-behaved context extractor to effectively extract useful global context.Therefore,to enhance the capability of context extractor in better capturing global context,this paper proposes to pre-train a context extractors on large-scale monolingual document-level dataset.Specifically,this paper proposes a novel self-supervised pre-training task,which recovers sentences within document-level context.Then,the pre-trained context extractor could be used for downstream context-aware NMT models.Detailed experimental results on various document-level translation tasks show that our pre-training approach significantly boosts the performance of various downstream context-aware NMT models.(3)Enhancing model for document-level NMT.Context-aware NMT suffers from the size of document-level parallel dataset.To break the corpus bottleneck,this paper propose to use both large-scale sentence-level parallel dataset and(source-side)monolingual documents for enhancing translation model and the global context extractor,respectively.To this end,this paper joint pre-train sentence-level translation and document-level sentence recovering.Then,the pre-trained model is fine-tuned on document-level parallel dataset.Experimental results on various document-level translation tasks show that our approach obviously improves translation performance.A nice property of our approach is that the finetuned model can be used to translate both sentences and documents. |