Font Size: a A A

Research On Chinese-Korean Neural Machine Translation Method Based On Transfer Learning

Posted on:2022-07-27Degree:MasterType:Thesis
Country:ChinaCandidate:Q WangFull Text:PDF
GTID:2518306338956139Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Translation is an important requirement for the exchange of human thoughts.Intelligent translation technology accelerates the integration of different civilizations and promotes the development of human society.Deep learning technology has been successfully applied in the field of modern machine translation,and has achieved good translation effects in many language translation tasks.However,the neural machine translation model is limited by the scale of data volume,the translation effect is not satisfactory for the language pairs with small data and low resources.In this dissertation,we proposed a transfer learning-based neural machine translation method for Chinese and Korean bilingual parallel corpus to improve the translation performance.The main research work of this dissertation is summarized as follows.Firstly,we studied the automatic alignment of Chinese and Korean sentences,and proposed a sentence alignment algorithm combining Sino-Korean words in Korean text,then split the corpus into sentences and aligned the sentences of the corpus according to the probability and dynamic programming algorithm.Secondly,a Chinese-Korean neural machine translation method using weight sharing was proposed to train the parent model under the encoder-decoder framework,and then we passed the network weights of the parent model to the child model,integrated the vocabularies of the parent and child models,represented the word vectors of the child model with a common vocabulary,and finally trained the child model until convergence.Finally,a combination of pre-trained language models was proposed.The BERT network structure was used as the encoder part of the machine translation model,and the Transformer model was initialized by BERT.The wordpiece byte encoding was used to divide the Chinese-Korean parallel corpus,and the corpus was cut into the form of subwords,which reduced the influence of unregistered words.The method proposed in this dissertation solved the problem of unregistered words and long sentence processing,and performed well in semantic fluency.The BLEU value of the weight-sharing-based Chinese-Korean neural machine translation model studied in this dissertation is 15.36,which is 2.68 higher than the BLEU value of the baseline model,and the BLEU value of the proposed translation model combined with the pre-trained model is 31.61,which is 1.74 higher than the BLEU value of the baseline model.It proves that the Chinese-Korean neural translation model proposed in this dissertation can effectively translate Chinese text to Korean text in the case of insufficient bilingual parallel corpus.
Keywords/Search Tags:Chinese-Korean neural machine translation, transfer learning, weight sharing, pre-trained language model, Chinese-Korean sentence alignment
PDF Full Text Request
Related items