Font Size: a A A

Research On Mongolian-Chinese Neural Machine Translation Incorporating Prior Knowledge

Posted on:2022-10-31Degree:MasterType:Thesis
Country:ChinaCandidate:R PangFull Text:PDF
GTID:2518306542976549Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Machine Translation is a key technology in the field of natural language processing.With the rapid development of deep learning in recent years,researches on neural machine translation(NMT)have made remarkable progress.However,as a data-driven technique,neural machine translation could not generate ideal results when dealing with low-resource tasks such as Mongolian-Chinese translation.The machine translation of low-resource languages can improve its performance by means of data enhancement or transfer learning.In this paper,prior knowledge integration methods are adopted to alleviate the low-resource problem of Mongolian and Chinese parallel corpora and improve the modeling ability of neural machine translation models for different types of linguistic information.Three different types of researches of prior knowledge integration are carried out separately.1.Choose a wealth of syntactic information of target side as prior knowledge.First parse the syntactic structure of the Chinese corpus on the target side,then transform the structure tree into a linearized syntactic tree of sequential data type which can be incorporated into neural machine translation model as the decoder input.Finally,the reordering score is used as an index to measure how much syntactic knowledge the translation model has learned.The experimental results show that the translation generated by this model has stronger grammatical structure.2.Mongolian-Chinese parallel phrase pairs are chosen as prior knowledge.Firstly,phrase pairs are extracted from parallel corpus as well as those gathered from external language resources.Then tag the phrases in the sentence.The decoder of this model have Word Mode and Phrase Mode,for those word segments tagged as phrases in source sentence,the decoder can translate them one word at a time or generate a corresponding target phrase as a whole.This method not only incorporates external knowledge into neural machine translation,but also makes an effort to extend the word-by-word generation mechanism of recurrent neural network.3.Choose the trained statistical machine translation model as prior knowledge.Firstly,statistical machine translation and neural machine translation are trained separately on the Mongolian-Chinese parallel corpus.Then at each decoding step of neural machine translation,statistical machine translation provides translation recommendations of the current time step based on neural machine translation generations.A gate mechanism is finally used to determine whether to adopt the statistical machine translation recommendations.Experimental results show that incorporating statistical machine translation recommendations can effectively improves the quality of translations.Experiments were carried out on a Mongolian-Chinese parallel corpus of 500,000 sentences pairs,experimental results show that all three types of prior knowledge integration methods can improve the performance of machine translation models.Among them,the BLEU score of the method that integrates syntactic structure information increases by 0.28,the BLEU score of the method that integrates external phrase information increases by 1.27,the BLEU score of the method that integrates statistical machine translation recommendations increases by 1.64,and the BLEU score of the method that further uses statistical recommendations to replace unsigned words increases by 2.18.
Keywords/Search Tags:Mongolian-Chinese Neural Machine Translation, Attention Mechanism, Prior Knowledge Incorporation, Statistical Machine Translation
PDF Full Text Request
Related items