Research On Mongolian-Chinese Machine Translation Based On Generative Method

Posted on:2022-07-06

Degree:Master

Type:Thesis

Country:China

Candidate:Z Y Guo

Full Text:PDF

GTID:2518306509960049

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In recent years,neural network machine translation has gradually become the best translation mode.Although it can automatically learn the semantic information in sentences,it is subject to one-way context during decoding,which makes the predicted target words have decoding delays and easily cause exposure bias.In addition,the scarcity of Mongolian-Chinese parallel corpora and the complexity of Mongolian word formation also restrict the improvement of translation performance.Therefore,how to improve and optimize the generated translations quality under the limited parallel corpus is a direction to be studied.In response to the above problems,this thesis conductes the following research:First,aiming at the problem of the scarcity of parallel corpus,this thesis perform BPE processing on Mongolian to obtain a sub-word granularity unit between the morphemes and words and alleviate the problems of low-frequency words and resource waste caused by sparse corpus.In addition,vectorization of suitable granularity units is also the focus of this thesis.The traditional word vector method has weak representation ability and cannot solve the polysemous word problem because the vector is fixed after the mapping.To this end,this article uses a lite BERT model learning features and trains Mongolian and Chinese dynamic word vectors.Second,aiming at the problems of decoding delay and exposure bias,this thesis introduces a non-autoregressive model based on generative ideas on the Mongolian-Chinese machine translation task,which maps the source language to the latent variable space rich in semantic information so that the target word is predicted only from the distribution space given by the latent variable,thereby eliminating the dependence on the generated target word to realize the parallel decoding of the translation.To this end,this article also gives solutions to two problems faced in the parallel decoding process: latent variable distribution prediction and target sequence length prediction.For the prediction of latent variable distribution,this article uses neural machine translation model to encode the hidden layer states and combine with the external fertility module to generate the probability distribution of the latent variable.Aiming at the prediction of the target sequence length,this thesis uses the source sequence copy mechanism to predict the number of possible translations for each source word using the length predictor module,and expands the source sequence by its own copy method and uses it as the decoder input to achieve more accurate parallel decoding.In addition,this thesis uses the knowledge distillation method to obtain distillation data from the autoregressive model to guide the training of the non-autoregressive model which in essence directly reduces the training data.Third,aiming at the problems of pronoun drop and coreference bias in the generated translation,this thesis uses neural networks to learn the characteristics of Chinese internal coreference relations,and calculate the coreference scores for the combinations that may have coreference relations.Then obtain the coreference relationship within the sequence,and use it as a constraint to guide the decoding process of the generative model to achieve the purpose of optimizing the translation.This thesis conductes experiments on the CWMT2018 Mongolian-Chinese dataset.The experimental results show that the generative method can significantly improve the decoding speed of the Mongolian-Chinese machine translation system,and the addition of the coreference resolution method not only optimizes the translation generated by the generative system but also does not bring significant decoding delay.

Keywords/Search Tags:

Mongolian-Chinese Neural Machine Translation, Generative Method, Non-autoregressive Model, Coreference Resolution

PDF Full Text Request

Related items

1	Research Of Optimization Methods Integration And Translation Rerank For Mongolian-chinese Machine Translation
2	A Study On Mongolian-Chinese Machine Translation Based On Neural Network
3	Research On Semi-supervised Mongolian-Chinese Neural Machine Translation Based On Cooperative Training
4	Multi-granularity Mongolian-chinese Neural Network Machine Translation Research
5	Research And Implementation Of Optimization And Post-Processing Technology In Chinese To Mongolian Neural Machine Translation
6	Implementation And Optimization Of Mongolian-chinese Neural Machine Translation System Based On Dual Learning
7	A Study On Statistical And Rule-Based Combined Mongolian-Chinese Machine Translation
8	Information Technology In Education-oriented Chinese-Mongolian Machine Translation Research
9	Mongolian-Chinese Neural Machine Translation Based On The Fea-tures Of Statistical Machine Translation
10	Mongolian Word Segmentation Method Based On Neural Network And Its Application