Font Size: a A A

Syntax-aware Unsupervised Neural Machine Translation

Posted on:2019-08-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q WuFull Text:PDF
GTID:2415330623463616Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Neural Machine Translation has achieved state-of-art results in multiple translation tasks.However,the requirement that neural machine translation impose to million-record high-quality parallel corpora makes it impossible to extend to practical scenarios.Besides,a majority of today's neural machine translation models learn syntax structures of sentences implicitly using deep neural network,thus reduce the accuracy of translation.Targeted at the extensibility and accuracy of neural machine translation,this thesis proposes syntax-aware unsupervised neural machine translation,wherein syntax knowledge is incorporated to increase accuracy,and supervision is removed to improve extensibility.The baseline in this thesis,which is unsupervised neural machine translation model,pushes neural machine translation to such a zero-resource extreme that not a single pair of parallel sentences nor any bilingual supervision signals are required throughout the preprocessing of corpus,generation of word embeddings,or training of the model.The model is based on word embeddings mapped by unsupervised methods and performs self-learning by repetitive iterations.At each iteration,de-noising and back-translation are sequentially executed,where de-noising takes a noised version of a sentence and trains the model to reconstruct the original sentence without any noise and back-translation trains the model with pseudo-parallel corpus generated by on-the-fly translation of input sentences using last-iteration's model.The improved models in this thesis,which is syntax-aware unsupervised neural machine translation model,firstly parse the corpus,then utilize syntax-rich corpus to generate and map word embeddings,and incorporate lexicalized phrase structure trees as linearized sequences into the model in a direct and explicit way.Depending on whether the model receives linearized lexicalized phrase structure trees as input,the improved models can be divided into two types: Tree2 Tree and String2 Tree.Both of the improved models are trained in an unsupervised manner by repetitively performing de-noising and back-translation.This thesis implemented the baseline model as well as improved Tree2 Tree and String2 Tree models,and experimented translation tasks in both directions with WMT14 English and French monolingual corpora.All three models,which also vary in syntax tag proportion and embedding mapping approaches,were quantitatively evaluated by BLEU score.Besides,the influence that syntax tags have on quality of word embeddings and embedding mapping,and that mapping approaches have on quality of embedding mapping was also explored empirically.It is proved by the experiments that,in both English-to-French and French-to-English translation,explicitly incorporating syntax information into unsupervised neural machine translation improves translation accuracy,in that String2 Tree raised BLEU score of the Englishto-French translation task to 12.79 from a baseline of 9.82,while Tree2 Tree raised BLEU score of the French-to-English translation task to 10.94 from 10.29.
Keywords/Search Tags:machine translation, neural machine translation, unsupervised, parsing, syntax
PDF Full Text Request
Related items