Font Size: a A A

Research On Mongolian And Chinese Neural Machine Translation Based On Grammar Supervision And Deep Reinforcement Learning

Posted on:2022-07-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y H GuoFull Text:PDF
GTID:2518306542976559Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Neural machine translation has developed rapidly in recent years and has achieved very rich research results.Neural machine translation uses the learning ability and generalization ability of neural networks to greatly improve the effect of machine translation.However,neural machine translation largely relies on the quantity and quality of bilingual parallel corpora,which limits the development of weak-resource machine translation to a certain extent.How to use limited corpus and translation methods to improve the translation quality of weak-resource translation models is an important research direction of machine translation.Mongolian is a relatively widely used language among many small languages.It is used as an official language in the Inner Mongolia Autonomous Region.Language is the core of cultural exchanges.The study of Mongolian-Chinese neural machine translation is of great significance to national cultural exchanges and dissemination.It is also an exploration and promotion of the research on resource-scarce neural machine translation.This paper uses BPE(Byte Pair Encoder)to preprocess the data encoding,then use the non-autoregressive Transformer model as the translation framework and improve the non-autoregressive Transformer model,thereby improving the translation quality of the model;in addition,the target language grammar identification information is integrated Model to improve translation quality;finally,deep reinforcement learning is applied to use sequence information as the training target to optimize the model and improve the translation performance of the model.This article mainly studies:(1)This paper uses BPE to preprocess the Mongolian-Chinese parallel corpus to obtain the BPE-encoded Mongolian-Chinese aligned corpus.In addition,Stanford core NLP is used to further syntactically analyze the Chinese corpus to prepare for the subsequent translation tasks.(2)In this paper,an external parser is used to generate the ground truth grammar identification block of the target language,and the block algorithm is used to generate a sequence of grammar identification blocks.Then the grammatical parsing decoder is integrated into the non-autoregressive Transformer translation model.The grammatical parsing decoder translates and outputs a fast sequence of grammar identification.The supervision of the grammar identification information makes the generated final translation have a better grammatical structure and enhances the interpretation of the translation model.And improve the quality of translation while maintaining the advantage of faster non-autoregressive translation.In addition,convolutional neural network CNNs are used to learn and extract sentence-level topic context for source language input sentences,and provide sentence-level features for the grammar parsing decoder.Mongolian is a language with rich morphological changes,and more source language information can be learned by extracting features from the CNNs sentence context topic attention module.(3)This paper adopts the deep reinforcement learning method to fine-tune and optimize the model with sequence-level information BLEU value as the target,the BLEU value as the reward,the neural machine translation model as the agent agent,iteratively update the model parameters,and encourage the model to generate high-quality Sentences,instead of the correct word tokens in each position,this method can effectively reduce the problem of word repetition and further improve the translation quality of the translation model.
Keywords/Search Tags:neural machine translation, grammar supervision, CNNs sentence context topic attention module, deep reinforcement learning
PDF Full Text Request
Related items