Font Size: a A A

Implementation And Optimization Of Mongolian-chinese Neural Machine Translation System Based On Dual Learning

Posted on:2022-05-19Degree:MasterType:Thesis
Country:ChinaCandidate:S SunFull Text:PDF
GTID:2518306509954439Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Deep learning has made great progress in the field of machine translation by virtue of the deep understanding of semantics of neural networks.However,for low-resource languages,a difficult problem to overcome is the sparse data caused by the lack of large-scale bilingual corpus,which results in poor translation quality.Common solutions such as unsupervised methods will bring additional noise and affect learning efficiency.To this end,this thesis uses a semi-supervised dual learning method to construct a Mongolian-Chinese neural machine translation model,forming a closed-loop feedback system in two dual tasks,obtaining feedback information from unlabeled data,and then uses the feedback to improve the quality of the two machine translation models in the dual task.Furthermore,this thesis adopts an automatic post-editing model based on copy mechanism to further improve the fluency and faithfulness of machine translation,and automatically edit the words that need to be optimized or can be used directly in the translation by virtue of copy-generation mechanism to improve the final quality of translation.The specific work content of this thesis is as follows:(1)Construct a Mongolian-Chinese neural machine translation model based on dual learning.First,use Mongolian and Chinese large-scale monolingual corpus to train two language models.Then use a large number of pseudo bilingual corpus generated by back-translation technology combining with real bilingual corpus to jointly train the initial translation models in Mongolian-Chinese and Chinese-Mongolian languages,and adjust the ratio of pseudo corpus and real corpus through iterations during the training process.The model can reduce the risk of noise while learning language representations.Finally,the dual learning method is used to jointly train two translation models and two language models,and combines the translation quality and fluency reward to further optimize the model parameters in the source-target and target-source directions,so as to achieve the purpose of improving the quality of the translation.(2)Construct an automatic post-editing model of the translation.First,the Mongolian source sentence and the machine translation of Chinese are jointly encoded in an interactive manner,and the interactive representation of the two languages is obtained.Then the prediction module predicts which words in the machine translation should be copied through the interactive representation of the two languages.Finally,the prediction and copy learning mechanism are combined to construct an automatic post-editing model to further improve the fluency and fidelity of the Chinese translation.(3)Realize the Mongolian-Chinese translation system.The browser/server structure is adopted to construct the translation system,which realizes the mutual translation between Mongolian and Chinese.Experiments verify that using of back-translation to expand the corpus can alleviate the over-fitting problem caused by the small scale of Mongolian-Chinese bilingual data.In the meanwhile,it is proved that compared to the total corpus training the initial translation model using the pseudo-corpus and the real bilingual corpus,the iterative knowledge refining method can fully learn the semantic information and reduce the noise.Then the dual learning method is used to further optimize the performance of the two translation models.In this thesis,the BLEU value of the Mongolian-Chinese translation model trained on the Transformer model using the dual learning method reaches 37.09.After that,dual learning is used to obtain the Chinese machine translation to further train the automatic post-editing model.Experiments show that the post-edited translation has higher fluency and fidelity,and the BLEU value reaches 39.34,which shows that the post-editing is also suitable for dealing with low-resource translation tasks.
Keywords/Search Tags:Mongolian-Chinese Machine Translation, Dual Learning Method, Iterative refinement, Back-Translation, Automatic post-editing
PDF Full Text Request
Related items