Syntax-aware Unsupervised Neural Machine Translation

Posted on:2019-08-24

Degree:Master

Type:Thesis

Country:China

Candidate:Y Q Wu

Full Text:PDF

GTID:2415330623463616

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Neural Machine Translation has achieved state-of-art results in multiple translation tasks.However,the requirement that neural machine translation impose to million-record high-quality parallel corpora makes it impossible to extend to practical scenarios.Besides,a majority of today's neural machine translation models learn syntax structures of sentences implicitly using deep neural network,thus reduce the accuracy of translation.Targeted at the extensibility and accuracy of neural machine translation,this thesis proposes syntax-aware unsupervised neural machine translation,wherein syntax knowledge is incorporated to increase accuracy,and supervision is removed to improve extensibility.The baseline in this thesis,which is unsupervised neural machine translation model,pushes neural machine translation to such a zero-resource extreme that not a single pair of parallel sentences nor any bilingual supervision signals are required throughout the preprocessing of corpus,generation of word embeddings,or training of the model.The model is based on word embeddings mapped by unsupervised methods and performs self-learning by repetitive iterations.At each iteration,de-noising and back-translation are sequentially executed,where de-noising takes a noised version of a sentence and trains the model to reconstruct the original sentence without any noise and back-translation trains the model with pseudo-parallel corpus generated by on-the-fly translation of input sentences using last-iteration's model.The improved models in this thesis,which is syntax-aware unsupervised neural machine translation model,firstly parse the corpus,then utilize syntax-rich corpus to generate and map word embeddings,and incorporate lexicalized phrase structure trees as linearized sequences into the model in a direct and explicit way.Depending on whether the model receives linearized lexicalized phrase structure trees as input,the improved models can be divided into two types: Tree2 Tree and String2 Tree.Both of the improved models are trained in an unsupervised manner by repetitively performing de-noising and back-translation.This thesis implemented the baseline model as well as improved Tree2 Tree and String2 Tree models,and experimented translation tasks in both directions with WMT14 English and French monolingual corpora.All three models,which also vary in syntax tag proportion and embedding mapping approaches,were quantitatively evaluated by BLEU score.Besides,the influence that syntax tags have on quality of word embeddings and embedding mapping,and that mapping approaches have on quality of embedding mapping was also explored empirically.It is proved by the experiments that,in both English-to-French and French-to-English translation,explicitly incorporating syntax information into unsupervised neural machine translation improves translation accuracy,in that String2 Tree raised BLEU score of the Englishto-French translation task to 12.79 from a baseline of 9.82,while Tree2 Tree raised BLEU score of the French-to-English translation task to 10.94 from 10.29.

Keywords/Search Tags:

machine translation, neural machine translation, unsupervised, parsing, syntax

PDF Full Text Request

Related items

1	A Translation Method And System Implementation For English Poem To Chinese Poem Based On Unsupervised Machine
2	Design And Implementation Of A Parsing-based Machine Translation Algorithm For Numerals From Chinese To English And To German
3	Research On The Quality Evaluation "Google Neural Machine Translation System"
4	Research On Chinese-Mongolian Neural Machine Translation Based On AMR Semantic And Graph Neural Network
5	Comparison Between Human-Machine Translation Of Netease And Traditional Human Translation
6	Research On Attention-Based Neural Machine Translation With Encoder-Decoder Architecture
7	Research And Implementation Of Neural Machine Translation Model Based On Fusion Of Dependency Syntactic Information
8	Exploration Of The Boundary Between Human Translation And Machine Translation
9	Machine Translation Models Based On ANNs
10	Research On Chinese-Vietnamese Machine Translation Methods Integrating Syntactic Knowledge