Multi-granularity Mongolian-chinese Neural Network Machine Translation Research

Posted on:2019-10-01

Degree:Master

Type:Thesis

Country:China

Candidate:H B Wang

Full Text:PDF

GTID:2428330563956751

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

The limited training corpus in neural MT(machine translation)will inevitably lead to data sparseness.The Mongolian-Chinese parallel corpus is poor,so the data sparseness of Mongolian-Chinese MT will be more serious.In the neural MT,the word segmentation technique has been widely used in western language MT and has been proven to be effective.In this dissertation,we study the translation effects of Mongolian-Chinese neural MT models under a variety of Mongolian segmentation granularity.In this dissertation,three neural MT models that are the recurrent neural MT model,convolution neural MT model,and the Transformer translation MT models are studied.The study found that Mongolian sub-words achieve the best translation performance,Mongolian words second,and Mongolian characters third.It can be stated that a certain degree of segmentation can improve the performance of MT.This dissertation analyzes the characteristics of Mongolian character MT.First,the character sequence length is 4-5 times the length of the original sentences,result in increasing the difficulty of longdistance dependence learning;Second,after the Mongolian word divided into characters,the boundary between the words disappeared and the model needs firstly to learn the sequence of characters cut by the same word.Thirdly,it was found in the research that some output of the encoder of the character-level MT model has a very small weights on the decoder,so this part of the information needs to be filtered in the encoder.This dissertation aims at the above characteristics,proposes two methods to improve character-level MT model.The first is to add the convolution layers that can learn local information of character sequence.Second,add a linear gated unit to filter the output of the convolution layer.The improved character-level MT model experimental result obtained 41.07 BLEU,which is 6.64 BLEU higher than before,3.84 BLEU higher than the sub-wordlevel MT model,and 5.21 BLEU higher than the word-level MT model.The improvement was applied to the character-level recurrent neural MT model,which is 3.54 higher than before.

Keywords/Search Tags:

Mongolian-Chinese Machine Translation, Neural Machine Translation, Character-level Machine Translation

PDF Full Text Request

Related items

1	Multi-granularity Mongolian-chinese Neural Network Machine Translation Research
2	Mongolian-Chinese Neural Machine Translation Based On The Fea-tures Of Statistical Machine Translation
3	Research And Implementation Of Optimization And Post-Processing Technology In Chinese To Mongolian Neural Machine Translation
4	Methods For Handling OOV In Chinese-uyghur Neural Machine Translation
5	A Stastical Machine Translation System Between Mongolian And Chinese
6	Mongolian Lexical Analysis Research And Its Application In Statistical Machine Translation
7	Implementation And Optimization Of Mongolian-chinese Neural Machine Translation System Based On Dual Learning
8	A Study On Mongolian-Chinese Machine Translation Based On Neural Network
9	Research On Mongolian-Chinese Neural Machine Translation Incorporating Prior Knowledge
10	Research On Morphologically Asymmetric Chinese Mongolian Statistical Machine Translation Model Construction Methods