Font Size: a A A

Research On Key Technologies Of Neural Machine Translation For Low-resource Languages

Posted on:2021-03-12Degree:MasterType:Thesis
Country:ChinaCandidate:W LaiFull Text:PDF
GTID:2438330602498434Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Machine translation,research on automatic translation from one language to another,is one of the important research directions of NLP.In recent years,with the rapid development of deep neural network technology,machine translation research in academia and industry has gradually transformed from traditional statistical machine translation to neural machine translation.After large-scale,high-quality parallel corpus training,the performance of neural machine translation has reached the level comparable to human manual translation in multiple translation tasks.However,with the exception of a few languages such as English and Chinese,there is no large-scale parallel corpus between most languages in the world is low-resource languages,which brings challenges to the research of neural machine translation.This thesis aims to explore the application of neural machine translation technology in low-resource language scenarios.To this end,the machine translation technology of the national languages along the "Belt and Road" and the languages of Chinese ethnic minorities and Chinese is studied in three cases.The main contributions of this article are:Aiming at the problem of data scarcity in low-source language neural machine translation,on the premise of having a small number of parallel corpora,this paper combines corpus alignment and grammatical error correction techniques to propose a data augmentation method based on semantically related word replacement strategies.Increase the number of parallel corpora to achieve the goal of improving neural machine translation performance.Experimental results show that the method achieves good performance in translation tasks for multiple language pairs,with a maximum of 3.06 BLEU points improvement.Aiming at the problem that there is no parallel corpus between some language pairs,this paper proposes an unsupervised neural machine translation model using large-scale monolingual data of two languages,combined with bilingual parallel corpus mining technology and cross-language word embedding technology.Study the machine translation performance of low-resource language pairs,especially long-distance language pairs.Experimental results show that this method can improve the performance of unsupervised neural machine translation tasks to a certain extent,with a maximum of 5.19 BLEU points improvement.Aiming at the problem that there is no parallel corpus between some languages and Chinese,but parallel corpus with English,this paper uses English as a pivot language,combines dual learning models and model fusion ideas,and proposes a neural machine translation based on pivot languages.The fusion model improves the performance of neural machine translation.Experimental results show that the performance of the method in multilingual translation tasks has been significantly improved,with a maximum of 16.31 BLEU points improvement.10 languages selected through the "Belt and Road" countries and minority languages in China(low-resource languages involved in the experiment include:Mongolian,Tibetan,Uyghur,Arabic,Russian,Portuguese,Hindi,Estonian,Latvian,Romanian)and Chinese machine translation experiments show that the three proposed methods have achieved different degrees of machine translation performance improvement compared with current machine translation methods.At the same time,in order to compare with the current related technologies,we also conducted experiments in some resource-rich languages,which also verified the effectiveness of our proposed method.
Keywords/Search Tags:Neural Machine Translation, Low-Resource Languages, Data Augmentation, Unsupervised Learning, Pivot-Based Machine Translation
PDF Full Text Request
Related items