Research On Improving Translation Diversity In Back-Translation

Posted on:2021-01-06

Degree:Master

Type:Thesis

Country:China

Candidate:H K Zhu

Full Text:PDF

GTID:2518306017459624

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

In recent years,the neural machine translation models with an encoder-decoder framework have achieved inspiring results in many language pairs.Especially after the introduction of attention mechanism,the performance of neural machine translation has been further improved,and fully overstepped the traditional statistical machine translation system.Although neural machine translation has a good performance in many aspects,it still faces many challenges,especially in the field of data.The increasingly large models have higher requirements for data volume and quality,but constructing millions of high-quality parallel corpora are often paid a high price.And research shows that more diverse data is beneficial to model improvement,so it becomes meaningful to discuss how to improve the efficiency of using existing data.As we know,the back-translation technology expands the parallel corpus by translating the target monolingual into source language,which enriches the training data to a certain extent,but when it comes to decoding,back-translation is limited by greedy strategies,causing the predicted translation under diverse.So this article proposes two methods to boost data diversity in back-translation.For the first method,the idea is to introduce fluency boosting learning strategies into the neural machine translation training process.The basis of fluency improvement is grammatical error correction,which is usually used to find and correct written errors by foreign language beginners.The goal of fluency boosting learning is to find translation errors and correct them iteratively,bring to more diverse samples and quality-improved corpus.The advantage of our method is that it does not need to modify the structure of the translation model but only for training process.Therefore,it is theoretically model-independent and can be easily migrated to any machine translation framework.The second idea for improving diversity of back-translation is to introduce evolutionary algorithms in the beam search decoding phase,which generates more candidate samples by crossover and mutation between predicted sequences,so that there are much more choices for sampling than beam search.The idea of evolutionary algorithm comes from genetic variation in nature,which is the natural law to maintain biodiversity.Inspired by this,we select the winning sequence from the beam search output space,and then simulates gene recombination and mutation to generate more candidate sequences,thereby enhancing the diversity of neural machine translation decoding.Compared with the traditional back-translation data enhancement method,experimental results for the WMT'18 English?German news translation task show that our method improves translation quality by up to 0.5 BLEU points.

Keywords/Search Tags:

Neural Machine Translation, Data Diversity, Back-Translation, Fluency Boost Learning, Evolutionary Algorithms

PDF Full Text Request

Related items

1	Research And Implementation Of Uyghur-Chinese Machine Translation Based On Data Augmentation Technology
2	Research On Unsupervised Neural Machine Translation Technique
3	Research And Implementation On Uyghur-Chinese Neural Machine Translation
4	Research On Chinese-to-english Machine Translation Based On Neural Network
5	Implementation And Optimization Of Mongolian-chinese Neural Machine Translation System Based On Dual Learning
6	Research On Model Learning For Machine Translation
7	Research Of Optimization Methods Integration And Translation Rerank For Mongolian-chinese Machine Translation
8	Research On Tibetan-Chinese Machine Translation Under The Condition Of Sparse Resources
9	Research On Mongolian-Chinese Neural Machine Translation Based On Monolingual Corpus And Reinforcement Learning
10	Methods For Handling OOV In Chinese-uyghur Neural Machine Translation