Font Size: a A A

Research On Mongolian-Chinese Neural Machine Translation Based On Monolingual Corpus And Reinforcement Learning

Posted on:2022-10-11Degree:MasterType:Thesis
Country:ChinaCandidate:L L BianFull Text:PDF
GTID:2518306542976609Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the rapid growth of China's economy,there are more and more cooperation between different regions and more and more exchanges between ethnic groups.Mongolian is the language used by the Mongolian people in China.The translation from Mongolian to Chinese plays a pivotal role in the common development of ethnic groups.At the present stage,the effect of Mongolian-Chinese machine translation is not significant enough.The reason lies in that the Mongolian-Chinese machine translation model is an end-to-end model.When it is trained,it is trained with aligned data with annotations,and when it is inferred,it is based on the prediction of the model itself.At the same time,there is also a problem that the cross-entropy loss function is not consistent with the BLEU value.In this paper,reinforcement learning is introduced into the study of Mongolian-Chinese machine translation to solve these two problems.On the other hand,the lack of Mongolian-Chinese alignment corpus restricts the improvement of model quality.In this paper,the method of using Mongolian embedded single words vector and target side monolingual back translation method to obtain additional data information to increase the quality of the model.The specific work is as follows:(1)The corpus is segmented to explore the influence of different levels of corpus granularity on the model.The influence of character,word and subword levels on Mongolian-Chinese neural machine translation was explored respectively.At the same time,two different processing methods of BPE algorithm,namely,independent BPE algorithm and joint BPE algorithm,are experimented.The experiment shows that the independent BPE algorithm is more conducive to the improvement of the Mongolian and Chinese translation model.(2)The quality of the word vector directly affects the quality of the final model.Thesis uses a large number of Mongolian monolingual word vector embedding models to improve the quality of the translation model.Three different word vector generation models are used to generate the word vector of Mongolian monolingual data,and the word vector is observed visually.The effects of Mongolian monolingual word vectors generated by different models on the quality of the translation model were experimented,and the effects of four word vector dimensions on the translation model were explored.Finally,denoising autoencoder was introduced into the translation model to further improve the fluency of the translation.The experimental results show that the model can be significantly improved by embedding a large number of monolingual word vectors into the Mongolian-Chinese translation model and using denoising autoencoder to learn language features.Moreover,under the current experimental conditions,the quality of the Mongolian-Chinese translation model is the best when the word vector dimension is 512 dimensions.(3)By introducing reinforcement learning into the training of Mongolian to Chinese translation model,the training and inference of translation model are unified in the perspective of predictive value to solve the problem of inconsistency of training and inference.At the same time,the BLEU value was directly involved in the training of the model to solve the problem of inconsistent evaluation levels.In addition,for the reward level,the terminal reward and the word-level reward were set respectively for experimental exploration.Through linear combination with the cross-entropy loss function,the better results of the experimental model were obtained,and the method of translation generation was experimentally explored.Finally,the target monolingual back translation was combined with reinforcement learning training in the Mongolian-Chinese machine translation task to improve the model quality and achieve the highest BLEU value of the subject...
Keywords/Search Tags:Reinforcement Learning, Mongolian-Chinese Machine Translation, Back Translation, Monolingual Vector, Byte Pair Encoding
PDF Full Text Request
Related items