Font Size: a A A

An Architecture Search Algorithm To Accelerate Transformer On Hardware

Posted on:2021-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:R JiaoFull Text:PDF
GTID:2428330647450842Subject:Engineering
Abstract/Summary:PDF Full Text Request
The thesis mainly focuses on how to make Transformer smaller and faster.One of the key problems that stop deep learning methods from being widely used is that modern deep learning models require lots of computation operations when running.It is meaningful to study how to make them smaller and faster on target hardware platform.Algorithms should consider more than just the number of parameters.It is beneficial to optimize model latency directly.The thesis gives a detailed survey on Transformer and Neural Architecture Search methods.In this thesis,an algorithm called NAS-Transformer is presented.It has 3main components:1.A Transformer-based search space with parameter sharing.The search space uses the redundancy in parameters to make Transformer smaller and faster.2.A latency prediction model.The model predicts the latency when a network is doing inference on target hardware.Inference latency is hard to analysis theoret- ically.This model uses machine learning methods and learns to predict latency automatically.3.A multi-objective optimization method to optimize accuracy,latency and number of parameters simultaneously.It gives pareto optimal solutions.The main innovations of the algorithm are as follows: 1.A search space is proposed to generate smaller models by analyzing the redundancy in Transformer.2.The search process and the hardware platform are decoupled by using a latency prediction model.It also minimizes the time needed to test latency on hardware.The design of search space and search strategy are motivated by modern techniques used in CNN acceleration.The model searched by NAS-Transformer achieves high compression ratios with a small drop in performance.On WMT2014 EN-DE dataset,our model uses 22.6% less parameters with a 0.16 BLEU drop in performance compared to Transformer.The model latency is also 23.5% smaller.The search costs just 3 times the time needed to train a Transformer,which is acceptable.
Keywords/Search Tags:Transformer, Neural Architecture Search, Neural Network Acceleration, Machine Translation
PDF Full Text Request
Related items