Research On Neural Machine Translation Under Certain Low Resource Conditions

Posted on:2021-05-31

Degree:Master

Type:Thesis

Country:China

Candidate:M Tan

Full Text:PDF

GTID:2428330605474777

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the continuous development of social economy,science and technology,the de-mand for translation between multiple languages is increasing.Machine translation has become a common method for people to solve a large number of translation tasks.In recent years,neural machine translation has become the mainstream method of machine transla-tion due to its excellent translation performance.The training of neural machine translation model requires a large amount of parallel corpora.The quality,quantity and domain of corpora have great influence on the performance of the model.However,in the actual re-search background,the shortage of data resources is a common problem,and the domain and language category of parallel corpora are relatively rare.In order to solve this prob-lem,this paper proposes several methods on neural machine translation under low resource conditions,aiming at improving the translation quality under low resource conditions.(1)In order to solve the translation problem under the low resource condition of domain corpora,this paper proposes a domain adaptation method for neural machine translation based on domain features.In the problem of low domain resources,there are usually abun-dant corpus resources in out-of-domain and scarce corpus resources in in-domain.Using abundant domain resources can help improve the translation quality of domain with scarce resources.Firstly,Multilayer Perceptron is used to train the domain discriminator,which has the ability to automatically judge the category of sentence domain.Secondly,this paper modify the model objective function,jointly train the generator and the domain discrimina-tor to obtain the feature sensitive network and the feature insensitive network respectively;Finally,the ensemble learning method is used to combine the generator,feature sensitive network and feature insensitive network to complete translation prediction.Experiments were conducted in the domain of English-Chinese radio dialogue and English-German spo-ken language.The results show that the method can fully learn domain features and improve the quality of translation in low-resource domains.(2)In order to solve the translation problem under the low resource condition of par-allel corpora,this paper proposes a low-resource neural machine translation method with bilingual dictionaries.In the case of only monolingual resources can be used,the method proposes to replace monolingual corpus by bilingual dictionaries,so that multiple mono-lingual corpora contain only one language,achieving the effect of sharing word embedding vectors,and then conducting translation model training.The training corpora are constructed by using real bilingual dictionaries and conventional bilingual dictionaries respectively.The conventional bilingual dictionaries can be obtained by vector similarity,and then the trans-lation model is trained by training denoising autoencoder and back-translation.Experiments were conducted on translation tasks of related and unrelated language pairs.The experimen-tal results show that the use of bilingual dictionaries and monolingual corpora can effectively improve the translation quality of low resources.(3)In order to improve the performance of low-resource translation models,this paper studies the system combination method in neural machine translation.Ensemble learning is a common method to improve the prediction ability of models in machine learning.It is generally applied in decoding stage in machine translation.In this paper,ensemble learning technology is applied to model training process,and five feature combination methods are proposed and applied to N-1 combination system,N-N combination system and 1-N combi-nation system respectively.The combination system is modeled based on cyclic neural net-work and attention network respectively,and the effect of the system combination method in this paper is tested on Chinese to English translation tasks.Experimental results show that the system combination method in this paper can effectively improve the performance of low-resource translation models.In order to solve the problems of low domain resources and low parallel corpus re-sources in machine translation,this paper proposes the methods of learning domain features,using bilingual dictionaries and monolingual corpora to improve the translation quality of low resources respectively.Finally,this paper studies the system combination method in neural machine translation to improve the comprehensive performance of the translation model.

Keywords/Search Tags:

neural machine translation, low resource, domain adaptation, bilingual dictionary, system combination

PDF Full Text Request

Related items

1	Domain Adaptation For Statistical Machine Translation
2	Research On Chinese-Myanmar Neural Machine Translation Method Integrating Bilingual Dictionary
3	Research On Domain Adaptation Methods For Neural Machine Translation
4	Research On Semantics Analysis-based Domain Adaptation Reinforcement Method For Machine Translation
5	Research On Domain Adaptation For Neural Machine Translation
6	Research On Domain Adaptation For Statistical Machine Trans- Lation
7	Domain Adaptation For Statistical Machine Translation
8	Exploring Method Of Domain Adaptation For Statistical Machine Translation
9	Academic Bilingual Resource Research Based On Web Paper Library
10	Bilingual Parallel Corpus Filtering Method Based On Siamese XLM-R Neural Networks And Feature Fusion In Machine Translation