Font Size: a A A

Research On Document-Level Neural Machine Translation

Posted on:2021-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:P ZhangFull Text:PDF
GTID:2428330605974882Subject:Computer technology
Abstract/Summary:PDF Full Text Request
At present,Neural Machine Translation(NMT)is the main research direction of machine translation.The research work of NMT usually takes sentence-level translation as the research object.For the process of translation,a single sentence is often viewed as an independent individual,thus ignoring the context information of the sentence in the document.In order to make use of the document-level information to generate more suitable translation,so that the translation can maintain the consistency of translation style and the accuracy of translation in the whole document or specific semantic environment,we propose three methods in this paper.The main contents include(1)Context Recovery for Document-Level Neural Machine Translation.For sentence-level NMT,there is always a problem of incomplete semantic representation since the context information of the current sentence is not considered.We get document-level information from each sentences by dependency parsing,and then complement the information into the source sentences,making the semantic representation of the sentences more complete.We conduct experiments on Chinese-English language pair,and propose a training method on large-scale parallel language pairs for the scarcity of document-level parallel corpus.(2)Learning Contextualized Sentence Representations for Document-Level Neural Machine Translation.In order to capture inter-sentential dependencies in the document,the information of context sentences is often integrated into the current sentence for document-level NMT,so that the current sentence contains context information.In this paper,we propose a document-level framework to model cross-sentence dependencies by training NMT model to predict both the target translation and surrounding sentences of a source sentence.By enforcing the NMT model to predict source context,we want the model to learn "contextualized" source sentence representations that capture document-level dependencies on the source side.(3)Fusing Context-Aware Sentence Representations for Document-Level Neural Machine Translation.When translating documents,the NMT system adopts sentence by sentence translation method without considering the representations of sentences in the document.In this paper,a document-level NMT model is proposed,which uses an additional context sentence encoder to learn the context-aware sentence representations of source sentence to the other sentences in the document,and then integrates this sentence representation into the encoder and decoder.Compared with method(2),this method can use more source sentences and target sentences information.
Keywords/Search Tags:Neural Machine Translation, Document, Context Recovery, Contextualized Sentence Representations
PDF Full Text Request
Related items