An Architecture Search Algorithm To Accelerate Transformer On Hardware

Posted on:2021-01-24

Degree:Master

Type:Thesis

Country:China

Candidate:R Jiao

Full Text:PDF

GTID:2428330647450842

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

The thesis mainly focuses on how to make Transformer smaller and faster.One of the key problems that stop deep learning methods from being widely used is that modern deep learning models require lots of computation operations when running.It is meaningful to study how to make them smaller and faster on target hardware platform.Algorithms should consider more than just the number of parameters.It is beneficial to optimize model latency directly.The thesis gives a detailed survey on Transformer and Neural Architecture Search methods.In this thesis,an algorithm called NAS-Transformer is presented.It has 3main components:1.A Transformer-based search space with parameter sharing.The search space uses the redundancy in parameters to make Transformer smaller and faster.2.A latency prediction model.The model predicts the latency when a network is doing inference on target hardware.Inference latency is hard to analysis theoret- ically.This model uses machine learning methods and learns to predict latency automatically.3.A multi-objective optimization method to optimize accuracy,latency and number of parameters simultaneously.It gives pareto optimal solutions.The main innovations of the algorithm are as follows: 1.A search space is proposed to generate smaller models by analyzing the redundancy in Transformer.2.The search process and the hardware platform are decoupled by using a latency prediction model.It also minimizes the time needed to test latency on hardware.The design of search space and search strategy are motivated by modern techniques used in CNN acceleration.The model searched by NAS-Transformer achieves high compression ratios with a small drop in performance.On WMT2014 EN-DE dataset,our model uses 22.6% less parameters with a 0.16 BLEU drop in performance compared to Transformer.The model latency is also 23.5% smaller.The search costs just 3 times the time needed to train a Transformer,which is acceptable.

Keywords/Search Tags:

Transformer, Neural Architecture Search, Neural Network Acceleration, Machine Translation

PDF Full Text Request

Related items

1	Research On Neural Machine Translation Based On Re-decoding
2	A Transformer-based Unified Neural Network For Quality Estimation Of Machine Translation
3	Research And Application Of Key Technologies In Neural Machine Translation
4	Research On Chinese-English Neural Machine Translation Based On Joint Learning
5	Research On Chinese-to-english Machine Translation Based On Neural Network
6	Research On Compression And Acceleration Of Deep Convolutional Neural Networks
7	Research On Neural Machine Translation Based On Hierarchical Analysis Of Syntactic Rules
8	Methods For Handling OOV In Chinese-uyghur Neural Machine Translation
9	Neural Machine Translation System Based On Biomedical Corpus
10	Improving Neural Machine Translation With Syntactic Knowledge