Font Size: a A A

Speculative Decoding In Neural Machine Translation

Posted on:2024-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:H M XiaFull Text:PDF
GTID:2568307070999119Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Machine translation is an important research field in natural language processing as well as artificial intelligence.It aims to automatically transform source language sentences into the target language through models while maintaining semantic consistency as much as possible.In recent years,with the continuous development of deep learning technologies,Neural Machine Translation(NMT)models have gradually become the mainstream models for the task.Compared with the statistical machine translation methods,NMT models have achieved significant improvements in translation reliability and accuracy,even reaching human-level performance in various translation tasks.Therefore,NMT technology has been widely applied in online commercial translation systems such as Google and Baidu,to meet the online translation needs of a vast number of Internet users.However,since the NMT models decode the target sentence word by word in an autoregressive manner,the inference latency of the entire sentence increases as the sentence length increases.This limits the application and promotion of NMT models in online commercial translation systems.To tackle the above problem,researchers propose non-autoregressive translation(NAT)to speed up the inference.Compared with autoregressive translation(AT)models that generate target sentences word by word,NAT proposes to decode multiple tokens in the target sentence in parallel,thus greatly improving the inference efficiency of the model.However,since NAT models can not fully exploit the contextual information between target tokens,their translation quality always suffers a large degradation compared to AT models.How to maintain a balance between translation quality and inference efficiency is still the most concern of NAT research.In the work,inspired by speculative execution in computer architecture,we propose speculative decoding(Spec Dec)– a general decoding paradigm that combines the strengths of AT and NAT models.We further provide a detailed analysis of Spec Dec,including the basic model architecture,the improvement of translation efficiency,and the optimization of translation quality:1.Inspired by speculative execution in computer architecture,we propose a general decoding paradigm – speculative decoding(Spec Dec)for efficient sequence-to-sequence generation.Spec Dec combines the advantages of AT and NAT,significantly improving inference efficiency while ensuring translation quality comparable to AT.Specifically,at each decoding step,Spec Dec first utilizes an efficient NAT model to speculatively draft multiple tokens in parallel and then uses a high-quality AT model to verify these tokens simultaneously.We design the basic model architecture of Spec Dec and explore multiple verification strategies.Experimental results show that Spec Dec achieves 3× ~5× speedup with translation quality comparable to AT,which outperforms the state-ofthe-art NAT research in the ”speedup-quality” tradeoff.2.From the perspective of model compression,in order to improve the inference efficiency of Spec Dec and fully exploit the potential of NAT,we propose to utilize an n-gram model as the efficient AT verifier.Compared with the basic model architecture of Spec Dec,the n-gram-based model achieves higher inference efficiency(3× ~ 14×speedup compared with AT)with a certain scarification of translation quality,which is suitable for application scenarios with lower requirements for translation quality but higher requirements for inference efficiency.3.From the perspective of performance optimization,we propose to introduce the Connectionist Temporal Classification(CTC)model into Spec Dec to enhance the alignment between source sentences and target sentences of the NAT model,so as to achieve more accurate speculation.Experimental results show that,compared with the basic model architecture of Spec Dec,the CTC-based model achieves higher translation quality under the condition of sacrificing certain inference efficiency,which is suitable for application scenarios with lower requirements for efficiency and higher requirements for quality.Besides,the results demonstrate that as a general decoding paradigm,Spec Dec can further benefit from the leading NAT research.
Keywords/Search Tags:Deep learning, Natural Language Processing, Neural Machine Translation, Non-Autoregressive Translation
PDF Full Text Request
Related items