Font Size: a A A

Research On Improving Translation Diversity In Back-Translation

Posted on:2021-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:H K ZhuFull Text:PDF
GTID:2518306017459624Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,the neural machine translation models with an encoder-decoder framework have achieved inspiring results in many language pairs.Especially after the introduction of attention mechanism,the performance of neural machine translation has been further improved,and fully overstepped the traditional statistical machine translation system.Although neural machine translation has a good performance in many aspects,it still faces many challenges,especially in the field of data.The increasingly large models have higher requirements for data volume and quality,but constructing millions of high-quality parallel corpora are often paid a high price.And research shows that more diverse data is beneficial to model improvement,so it becomes meaningful to discuss how to improve the efficiency of using existing data.As we know,the back-translation technology expands the parallel corpus by translating the target monolingual into source language,which enriches the training data to a certain extent,but when it comes to decoding,back-translation is limited by greedy strategies,causing the predicted translation under diverse.So this article proposes two methods to boost data diversity in back-translation.For the first method,the idea is to introduce fluency boosting learning strategies into the neural machine translation training process.The basis of fluency improvement is grammatical error correction,which is usually used to find and correct written errors by foreign language beginners.The goal of fluency boosting learning is to find translation errors and correct them iteratively,bring to more diverse samples and quality-improved corpus.The advantage of our method is that it does not need to modify the structure of the translation model but only for training process.Therefore,it is theoretically model-independent and can be easily migrated to any machine translation framework.The second idea for improving diversity of back-translation is to introduce evolutionary algorithms in the beam search decoding phase,which generates more candidate samples by crossover and mutation between predicted sequences,so that there are much more choices for sampling than beam search.The idea of evolutionary algorithm comes from genetic variation in nature,which is the natural law to maintain biodiversity.Inspired by this,we select the winning sequence from the beam search output space,and then simulates gene recombination and mutation to generate more candidate sequences,thereby enhancing the diversity of neural machine translation decoding.Compared with the traditional back-translation data enhancement method,experimental results for the WMT'18 English?German news translation task show that our method improves translation quality by up to 0.5 BLEU points.
Keywords/Search Tags:Neural Machine Translation, Data Diversity, Back-Translation, Fluency Boost Learning, Evolutionary Algorithms
PDF Full Text Request
Related items