Font Size: a A A

Parallel Sequence Decoding In Neural Machine Translation

Posted on:2022-02-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:J L GuoFull Text:PDF
GTID:1488306611974819Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Machine translation is an important research topic in artificial intelligence and natural language processing.The goal is to automatically transform the sentences in the source language into the sentences in the target language through models,and ensure that the source sentences and the translation results have the same semantic information.In recent years,Neural Machine Translation(NMT)models have become the leading methods for the task,which are usually based on the sequence-to-sequence framework.The encoder encodes the source language sentence to the hidden representation with a fixed dimension,and then the decoder generates the target translation word-by-word and from left to right in an autoregressive manner given the hidden representation of the source sentence.NMT models have achieved great success in a variety of translation tasks,outperforming statistical machine translation methods in most languages,and even achieving human parity on some translation tasks.However,due to the autoregressive generation manner,the decoding speed of the NMT model is relatively slow,and it becomes the bottleneck of the application of NMT models in real machine translation systems.To deal with this problem,researchers propose parallel sequence decoding methods in the NMT model.When generating the target sentence,instead of following the autoregressive manner,the model generates all target tokens in parallel at each position,therefore significantly improves the decoding speed.However,compared with the autoregressive models,the translation accuracy of the parallel decoding model has been sacrificed because the context information between the target words is not utilized while prediction,which hinders the large-scale application of the model.To solve this problem,the thesis makes an in-depth study on the parallel sequence decoding models,and improves the translation accuracy while maintaining the advantage in the decoding speed from the perspectives including model architecture optimization,decoding algorithm optimization as well as incorporating pre-trained models:1.From the perspective of the model architecture,this thesis makes improvements from two aspects.Firstly,the decoder input of the original parallel decoding model is a copy of the source sentence which contains little target sentence information,which makes the optimization of the decoder a hard problem.This thesis proposes a parallel decoding model with enhanced decoder input to alleviate this problem.By introducing the transformation of the source sentence at the token level as well as the embedding level and taking the result of the transformation as the input of the decoder,the model introduces the target sentence information into the decoder input,thus reducing the difficulty of the optimization of the decoder.In experiments,the model significantly outperforms baseline parallel decoding models on various machine translation benchmark datasets,while achieving more than ten times speedup on the inference latency compared with the autoregressive model.Secondly,considering that the parallel decoding model and the autoregressive model have similar architecture,it is natural to apply transfer learning to improve the training of the parallel decoding model by utilizing the knowledge contained in the autoregressive model.However,because of the differences in the training paradigm of the two models,simply applying transfer learning does not lead to good results.Therefore,this thesis proposes a transfer learning method with curriculum learning,to achieve a smooth transformation between the two training paradigms and make more use of the information contained in the autoregressive model,and finally improve the model translation accuracy.2.From the perspective of the decoding algorithm,in order to achieve a trade-off between the translation quality and the decoding speed,this thesis studies the iterative decoding algorithm.To improve the robustness of the encoder and alleviate the problem of repetitive translation in parallel decoding models,this thesis proposes a jointly masked sequence-to-sequence model which masks the encoder input as well as the decoder input while training,equipped with tailored loss functions and decoding algorithms.The model performs comparably to the autoregressive model regarding the translation accuracy,while achieving more than five times speedup while decoding.3.From the perspective of incorporating pre-trained models,this thesis explores the method to apply pre-trained language models such as BERT into machine translation tasks.In order to tackle the challenges when fine-tuning pre-trained language models such as catastrophic forgetting,model inconsistency as well as sensitivity,this thesis proposes a model based on lightweight adapters.The model inserts a lightweight adapter layer into each pre-trained layer,and fixes the parameters of the pre-trained layer and only tunes the parameters of adapter layers while finetuning.Combined with the iterative decoding algorithm,the model outperforms the traditional autoregressive model in translation accuracy and doubles the decoding speed.
Keywords/Search Tags:Machine Learning, Deep Learning, Natural Language Processing, Machine Translation, Neural Network, Pre-Trained Language Model
PDF Full Text Request
Related items