Font Size: a A A

Multi-granularity Mongolian-chinese Neural Network Machine Translation Research

Posted on:2019-10-01Degree:MasterType:Thesis
Country:ChinaCandidate:H B WangFull Text:PDF
GTID:2428330563956751Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The limited training corpus in neural MT(machine translation)will inevitably lead to data sparseness.The Mongolian-Chinese parallel corpus is poor,so the data sparseness of Mongolian-Chinese MT will be more serious.In the neural MT,the word segmentation technique has been widely used in western language MT and has been proven to be effective.In this dissertation,we study the translation effects of Mongolian-Chinese neural MT models under a variety of Mongolian segmentation granularity.In this dissertation,three neural MT models that are the recurrent neural MT model,convolution neural MT model,and the Transformer translation MT models are studied.The study found that Mongolian sub-words achieve the best translation performance,Mongolian words second,and Mongolian characters third.It can be stated that a certain degree of segmentation can improve the performance of MT.This dissertation analyzes the characteristics of Mongolian character MT.First,the character sequence length is 4-5 times the length of the original sentences,result in increasing the difficulty of longdistance dependence learning;Second,after the Mongolian word divided into characters,the boundary between the words disappeared and the model needs firstly to learn the sequence of characters cut by the same word.Thirdly,it was found in the research that some output of the encoder of the character-level MT model has a very small weights on the decoder,so this part of the information needs to be filtered in the encoder.This dissertation aims at the above characteristics,proposes two methods to improve character-level MT model.The first is to add the convolution layers that can learn local information of character sequence.Second,add a linear gated unit to filter the output of the convolution layer.The improved character-level MT model experimental result obtained 41.07 BLEU,which is 6.64 BLEU higher than before,3.84 BLEU higher than the sub-wordlevel MT model,and 5.21 BLEU higher than the word-level MT model.The improvement was applied to the character-level recurrent neural MT model,which is 3.54 higher than before.
Keywords/Search Tags:Mongolian-Chinese Machine Translation, Neural Machine Translation, Character-level Machine Translation
PDF Full Text Request
Related items