Font Size: a A A

Research On Mongolian-Chinese Machine Translation Based On Semi-Supervised Method

Posted on:2021-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y WuFull Text:PDF
GTID:2428330620476431Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the progress of deep learning and the use of large-scale parallel corpus,in view of the research in the field of machine translation made remarkable achievements,but still needs a large amount of parallel corpora support behind these successful.However,the scarcity of bilingual parallel corpus makes it difficult to improve the performance of Mongolian-Chinese machine translation model.In this thesis,we construct a Mongolian-Chinese neural network machine translation model based on monolingual unsupervised method,the scarcity of parallel corpus in Mongolian and Chinese machine translation tasks can be effectively alleviated,and extended this method to the phrase-based statistical machine translation model to improve the performance better.Finally,the two systems on joint training to get a better translation model.The specific work of this thesis such as follows:(1)This thesis proposes a method of constructing unsupervised Mongolian Chinese neural network machine translation model based on monolingual corpus.The model uses Bi-LSTM and transformer network structure.In the training process,the model only uses a monolingual corpus and uses self-learning method to train monolingual word embedding at the Mongolian and Chinese end for cross-language word embedding toget a bilingual dictionary,and then initializes the translation model through the bilingual dictionary and the Chinese language model.In an unsupervised machine translation system,iterative back-translation can effectively expand the corpus,reduce the dependence of the model on the parallel corpus,and effectively alleviate the problem of parallel corpus scarcity in Mongolian-Chinese machine translation task.(2)This thesis implements a semi-supervised Mongolian-Chinese Phrase-Based Statistical Machine Translation model.First,we use unsupervised method to train the unsupervised Mongolian-Chinese Phrase-Based Statistical Machine Translation model,then uses Moses to train the pseudo parallel corpus and the supervised Statistical Machine Translation model,to improve the translation performance of the Mongolian-Chinese Phrases Based Statistical Machine Translation model.(3)This thesis implements the joint training method of the two models.In the Mongolian-Chinese Phrase-Based Statistical Machine Translation model,the translation unit in the model is the phrase segment,the result also avoids the problem of local ordering,which can better retain the structural information of the sentence and improve the translation effect.Therefore,in this thesis,we will train the semi-supervised Mongolian Chinese Neural network Machine Translation model and the semi-supervised Mongolian-Chinese Phrase-Based Statistical Machine Translation model under the EM framework to further improve the translation performance of the Mongolian-Chinese Machine Translation model.In this thesis,those experiment show the effect of the self-learning method in the word embedding training between Mongolian and Chinese with low similarity isbetter than that based on the generative adversary network method.The unsupervised training method can expand the corpus and improve the performance of the low resource language machine translation model.The BLEU value of the unsupervised Mongolian-Chinese Neural network machine translation model in this paper reaches18.76.Then the unsupervised method is applied to the Mongolian-Chinese Phrases-Based Statistical Machine Translation model,the result is better than the unsupervised Neural Machine Translation model,the BLEU value is 27.15.Due to a large amount of noise in the pseudo corpus obtained by the unsupervised method,which is not conducive for the semantic extraction of the model,this paper improves the unsupervised model and combines the advantages of Statistical Machine Translation model and Neural Machine Translation model for joint training.Finally,the Mongolian-Chinese machine translation model with higher performance compared with a single system has been realized,and the BLEU value reaches 38.16.The performance of the model is better than the supervised Mongolian-Chinese Neural network Machine Translation model,which depends on the research of Mongolian Chinese and other low resource language machine translation tasks.
Keywords/Search Tags:Mongolian-Chinese Machine Translation, Cross-language Word Embedding, Self-Learning Method, Iterative Back-Translation, Semi-Supervised Method, Joint training
PDF Full Text Request
Related items