Mongolian-Chinese Neural Machine Translation Based On Reinforcement Learning

Posted on:2021-01-19

Degree:Master

Type:Thesis

Country:China

Candidate:T G Bai

Full Text:PDF

GTID:2428330620476432

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In recent years,the research on neural machine translation has made remarkable progress with the rapid development of deep learning.While the study of MongolianChinese machine translation starts late,the scale of parallel corpus is relatively small and data sparsity is a serious problem.This thesis carries out research from the two aspects: First,the differences between loss function of training and the evaluation method of translation,as well as exposure bias caused by different distributions in the two phases of model training and testing.In this thesis,we introduce the idea of reinforcement learning,adopting different reward mechanisms to address the inconsistency issue,and trying various decoding methods to mitigate exposure bias;second,data sparsity makes model training difficult.We introduce a method of initializing the parameters of translation model with Mongolian sub-word vectors and a method of data augmentation by randomly adding noise data to augment MongolianChinese parallel corpus.The specific methods are as follows:First,we try to introduce the idea of reinforcement learning into MongolianChinese neural machine translation.We set reward functions of different levels(wordlevel and sequence-level),and conduct the experiments by combining the reward function and cross-entropy loss function linearly with different proportions.The optimal results are found at the set of 40% of sequence-level reward and 60% of cross entropy function.We also compare different decoding methods such as beam search and scheduled sampling.Experiments show that scheduled sampling has better result.Second,we propose a Mongolian sub-word vectors generation method.With the help of BPE segmentation algorithm,sub-word vectors are generated on large-scale monolingual corpus,consistent with the granularities of parallel corpus.Also,the vectors generated can be used to initialize parameters of translation models.This method can effectively relieve the sparse distribution of word vectors in vector space and improve the quality of word vectors when using different corpuses to train word vectors.Also,this method can effectively improve the quality of translation model.Third,targeting the Mongolian-Chinese parallel corpus,we propose a method of randomly adding noise to source data and conduct contrast experiments with back translation.The experiment results show that these two methods of data augmentation,for the scarcity of Mongolian-Chinese parallel corpus,both have effects on improving translation quality in Mongolian-Chinese neural machine translation,and back translation has a better effect thanks to introducing more monolingual data.The experiments of this thesis is based on CWMT2018 training set.The experiments show that the BLEU value can be increased up to 1.79 percentage points by using sub-word vectors to provide priori information for translation models.Compared with the baseline model,the BLEU value can be increased up to 0.6 percentage points in the Mongolian-Chinese neural machine translation based on reinforcement learning.In addition,the translation quality of Mongolian-Chinese neural machine translation improves BLEU value by 1.1 percentage points compared with the baseline via data augmentation.

Keywords/Search Tags:

Mongolian-Chinese Neural Machine Translation, Mongolian Sub-Word vector, Reinforcement Learning, Data Augmentation

PDF Full Text Request

Related items

1	Research On Mongolian-Chinese Neural Machine Translation Based On Monolingual Corpus And Reinforcement Learning
2	Research On Mongolian-Chinese Neural Machine Translation Based On Data Augmentation And Pseudo-Parallel Corpus
3	A Study On Mongolian-Chinese Machine Translation Based On Neural Network
4	Implementation Of Scenario-level Mongolian-chinese Machine Translation System Under Meta-learning Framework
5	Mongolian Word Segmentation Method Based On Neural Network And Its Application
6	A Study On Statistical And Rule-Based Combined Mongolian-Chinese Machine Translation
7	Implementation And Optimization Of Mongolian-chinese Neural Machine Translation System Based On Dual Learning
8	Research Of Optimization Methods Integration And Translation Rerank For Mongolian-chinese Machine Translation
9	Research On Chinese-Mongolian Neural Machine Translation Based On Monolingual Corpora
10	Multi-granularity Mongolian-chinese Neural Network Machine Translation Research