Font Size: a A A

Research On Neural Machine Translation Methods Incorporating Pre-trained Language Model Knowledg

Posted on:2024-04-07Degree:MasterType:Thesis
Country:ChinaCandidate:X ChenFull Text:PDF
GTID:2568307109988069Subject:artificial intelligence
Abstract/Summary:
Pre-training has been shown to be a very effective way to improve the performance of many Natural language processing(NLP)tasks.BERT is one of the most widely used pre-training language models available today,and by adding an additional task layer on top of it,BERT can be easily converted to a model dedicated to a specific task,and performance can be improved by fine-tuning the markup data.Such practices have been practiced in various NLP scenarios,with many state-of-the-art results.The research of combining BERT with Neural machine translation(NMT)model has attracted wide attention.However,utilizing BERT in NMT is not as simple as in other NLP tasks,and there are two problems:First,NMT models are mostly deep neural networks with parameter sizes comparable to or even larger than BERT models.Moreover,most existing NMT models use a large number of samples for training.Therefore,fine-tuning BERT on these tagged corpora requires a large number of updating steps to adapt to the current task,leading to the problem of catastrophic forgetting.Secondly,how to make NMT model make full use of BERT’s pre-training knowledge.To solve these problems,we put forward some solutions,the main work of this paper is:1.Mitigate the catastrophic forgetting of BERT on the translation task:In view of the issue of catastrophic forgetting caused by the fine-tuning of BERT on the translation task,we analyze the causes of such problems and propose the strategy of Masking matrix(Masking)to alleviate such problems.We conducted experiments on multiple translation task data,and the results show that our approach can effectively alleviate the problem of catastrophic forgetting,and can successfully use BERT’s knowledge to improve the performance of the neural machine translation model.2.Neural machine translation based on mask matrix strategy Bert-Attention mechanism:In order to further strengthen the utilization of the neural machine translation model to the output information of BERT,rather than the output of its last hidden layer as the input of the decoder side of the neural machine translation model.We propose a BERT enhanced neural machine translation(BE-NMT)model to improve the performance of the neural machine translation model through three parts:(1)We adopted the mask matrix strategy to alleviate the catastrophic amnesia caused by BERT’s fine tuning in the translation task;(2)In order to make full use of BERT’s output representation,we incorporate it as an additional feature into each layer of the encoder and decoder sides of the neural machine translation model by means of the attentional mechanism;(3)Due to the introduction of additional attention mechanisms,we propose a new approach of internal fusion and dynamic weighted fusion of multi-attention mechanisms to better balance multiple attention mechanisms;(4)We analyze the language information contained in each layer of BERT based on the mask matrix strategy,and find the language information missing in the output of the last hidden layer,which can promote the translation task to some extent.So,we’ve fused multiple layers of information.We conducted experiments on multiple translation task data sets,and compared the strong baseline model,our method significantly improved the translation performance of the model.3.Construction of neural machine translation prototype system:Based on the above theoretical research,we build a neural machine translation prototype system.The module of the system can be divided into three parts:sentence input and output module,input sentence preprocessing module,neural machine translation module.
Keywords/Search Tags:Neural network, Machine translation, Deep learning, Pre-trained language model, Natural language processing
Related items