Font Size: a A A

Research On Unsupervised Mongolian-Chinese Neural Machine Translation Combined With Parallel Sentence Extraction

Posted on:2022-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:H WangFull Text:PDF
GTID:2518306542978149Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
From the traditional statistical machine translation to the current neural machine translation,great progresses have been made in both the speed and accuracy of the translation generated by the translation model.Behind these progresses,a large number of high-quality parallel corpora are indispensable as the training support of the translation model.However,the high-quality parallel corpus of Mongolian and Chinese is currently in a stage of serious shortage.How to use the existing large amount of monolingual corpus data for unsupervised training to alleviate the problem of low quality caused by insufficient parallel corpus resources has become an important research topic of Mongolian-Chinese machine translation.This paper proposes an unsupervised Mongolian-Chinese neural machine translation method combined with parallel sentence extraction,which focuses on the optimization of dictionary induction,language model training and retranslation training in the traditional unsupervised translation framework.The specific work contents are as follows:(1)In order to alleviate the problem of low accuracy of the unsupervised MongolianChinese Dictionary induction method based on adversarial learning,an unsupervised Mongolian-Chinese Dictionary induction method based on the translation model is proposed.Firstly,an unsupervised Mongolian-Chinese statistical machine translation model with different granularity is built,and based on the translation model,a Mongolian-Chinese bilingual dictionary is induced by statistical word alignment technology.Compared with the dictionary induction by adversarial learning,the translation based on the dictionary induced by translation model is more accurate.After that,the unsupervised translation model is initialized by combining the bilingual dictionary based on the translation model and the language model based on the denoising autoencoder training,and then the unsupervised Mongolian-Chinese neural machine translation model is obtained by iterative training based on the back-translation method.The translation of this model can get a higher BLEU value than the translation model based on the dictionary induction by adversarial learning.(2)In order to solve the problem that the translation of unsupervised translation model based on denoising autoencoder is not natural and accurate enough.The Mongolian-Chinese language model is pre-trained based on the masked sequence to sequence(MASS)method.Firstly,Mongolian-Chinese monolingual corpus combined with random masking is used to train the encoder-attention-decoder of the Transformer to obtain Mongolian and Chinese language models.Then,by combining Mongolian-Chinese bilingual dictionary induction by translation model and back-translation training method,to build an unsupervised Mongolian-Chinese neural machine translation model based on language model pre-trained by MASS,the training speed and accuracy of the proposed model are better than those of the unsupervised Mongolian-Chinese translation model based on the denoising autoencoder.(3)To further optimize the unsupervised Mongolian-Chinese neural machine translation model,an unsupervised Mongolian-Chinese parallel sentence extraction method based on the Mongolian-Chinese synthetic dictionary is proposed.Firstly,based on word similarity and parallel sentence segment detection,the similarity of candidate sentences is weighted,and the threshold is set to mine high-quality Mongolian-Chinese parallel sentences from the Mongolian-Chinese monolingual comparable corpus established in this paper.After that,the extracted Mongolian-Chinese parallel sentences are added to the pseudo parallel corpus generated by the back-translation training method of the translation model,which further speeds up the convergence speed of the Mongolian-Chinese unsupervised neural machine translation model and improves the BLEU value of the translation.
Keywords/Search Tags:Unsupervised Learning, Mongolian-Chinese Neural Machine Translation, Parallel Sentence Extraction, Lexicon Induction, Pre-Training Method
PDF Full Text Request
Related items