Research On Chinese-Mongolian Neural Machine Translation Based On Monolingual Corpora

Posted on:2021-05-21

Degree:Master

Type:Thesis

Country:China

Candidate:Y C Cao

Full Text:PDF

GTID:2428330602996199

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Machine translation is an important research direction in the field of natural language processing,and neural machine translation(NMT)has bacame the mainstream method of machine translation research and application with the rapid development of deep learning.However,the disadvantages of NMT which relies heavily on large-scale parallel corpora to obtain better translation results still exist.Therefore,NMT is ineffective in low-resource language pairs translation such as Chinese and Mongolian.Compared with parallel corpora,monolingual corpora are more abundant and easier to obtain,and play an important role in low-resource machine translation.But monolingual corpora are still not well applied in NMT.In view of the shortage of Chinese-Mongolian parallel corpora resources and the complex rules of Mongolian word formation,this thesis explores the application of monolingual corpora as a supplement to parallel corpora in Chinese-Mongolian NMT,and proposes several Chinese-Mongolian NMT methods based on monolingual corpora.The main work of this dissertation includes:(1)This dissertation proposes a Chinese-Mongolian NMT method combining word embedding alignment and language modeling.First,Chinese and Mongolian word embeddings are trained using Chinese and Mongolian monolingual corpora respectively,and then the word embedding layers of the model are initialized with the aligned Chinese-Mongolian word embeddings.At the same time,monolingual corpora are exploited to train language modeling in the process of translation,so as to enhance the encoding and decoding ability of the model.(2)This dissertation proposes a Chinese-Mongolian NMT method based on character-level language modeling.As the NMT system is difficult to handle unknown words or low-frequency words,a NMT method based on character-level language modeling is designed to mitigate this problem.This method divides Chinese words and Mongolian words into characters,so that the model can deal with the unknown words or low-frequency words which do not appear in the corpora.In addition,thanks to the dual structure of the model,character-level language modeling can be performed during the translation process which makes the translation result smoother.(3)This dissertation proposes a Chinese-Mongolian NMT which combines weight sharing and character-aware language modeling pre-training.In order to make better use of the commonality between languages,the parameters of the first few layers of the model encoder are shared.In addition,character-aware language modeling pre-training is added to the translation model.More specifically,monolingual corpora are utilized to pre-train the whole model based on character-aware language modeling,and then the translation model is initialized by the pre-trained model before beginning translation.Finally,the model is trained for translation and character level language modeling is added to the first half of the training to fine tune the full model,so as to improve the translation performance.This dissertation explores the application of monolingual corpora in Chinese-Mongolian NMT and proposes Chinese-Mongolian NMT method combining word embedding alignment and language modeling,character-level language modeling,weight sharing and character-aware language modeling pre-training.The experimental results demonstrate that the three Chinese-Mongolian NMT models based on monolingual corpora can significantly improve the effectiveness of Chinese-Mongolian NMT.

Keywords/Search Tags:

Chinese-Mongolian neural machine translation, Monolingual corpora, Word embedding alignment, Language modeling, Weight sharing

PDF Full Text Request

Related items

1	Research On Mongolian And Chinese Machine Translation Based On Monolingual Corpus Training
2	Research On Mongolian-Chinese Neural Machine Translation Based On Monolingual Corpus And Reinforcement Learning
3	Research On The Method Of Mining Translation Of Words’ New Sense Based On Monolingual Corpora
4	Research On Mongolian-Chinese Machine Translation Based On Semi-Supervised Method
5	Implementation Of Scenario-level Mongolian-chinese Machine Translation System Under Meta-learning Framework
6	A Study On Mongolian-Chinese Machine Translation Based On Neural Network
7	Research On Chinese-Korean Neural Machine Translation Method Based On Transfer Learning
8	Low-Resource Machine Translation Techniques For Distant Language Pair
9	Research On Unsupervised Neural Machine Translation
10	Improving statistical machine translation using comparable corpora