Design And Optimization Of Configurable Hardware Accelerator For LSTM Neural Network

Posted on:2020-07-24

Degree:Master

Type:Thesis

Country:China

Candidate:X W Zhu

Full Text:PDF

GTID:2518305732473764

Subject:IC Engineering

Abstract/Summary:

PDF Full Text Request

In recent years,artificial intelligence(AI),especially deep learning has achieved a huge development because of plenty of open-source training data sets and the powerful compute capability to learn these data(the generation and application of GPU).Neural networks based on LSTM is a kind of recurrent neural networks,and this model is widely used in intelligent cognitive field,such as speech recognition,image instructions,natural language processing,etc.However,Traditional computing platforms,such as CPU,GPU cannot realize the LSTM algorithm quickly,and usually need to transfer a lot of weight data,which requires a large amount of memory on chip and huge power consumption.Hence,we need to design a flexible,fast and low-power hardware accelerator for LSTM algorithm to meet the growing demand for AI tasks.This paper designs a configurable hardware accelerator for LSTM neural networks based on the Xilinx FPGA platform,and uses the accelerator to complete the tasks of plane model recognition and track prediction.First,this paper carries out a detailed survey of the existing designs of the hardware accelerator based on LSTM,and introduces the innovations and the acceleration effect of these designs.Then,this paper describes the hardware accelerator for LSTM we designed in detail.On one hand,configurable LSTM controller is designed to meet the computing requirements of variable-sized LSTM models.On the other hand,for the sparse and quantized compression model,this paper designs a high-performance sparse matrix-vector multiplication unit which is configurable and jumps the zero activations to improve the speed of computing the LSTM model and achieve high energy efficiency.Next,the verification environment and verification method of the accelerator based on FPGA are introduced.And the test results are described and analyzed in detail.The test results show that the hardware accelerator for LSTM can achieve the speed of 2135 frames per second when used in the task of plane model recognition,and has an average processing speed of 15363 frames per second when used in the task of track prediction.Besides,the total power consumption of the accelerator is only 6.215 W.Sparse Matrix-vector Multiplication(SPMV)unit of the accelerator can reach 26.27 times acceleration compared with CPU,and its power is only 16%of the power of CPU.Compared with GPU,2.27 times acceleration is achieved and the power of the accelerator is 3%of GPU's.Moreover,the accelerator has 5.5 times energy efficiency of ESE[16].Last but not least,this paper proposes the corresponding optimization scheme for the disadvantages of the design,including the optimization for the mismatch between the data transmission bandwidth and the speed of on-chip computing cells,and the optimization for the nonlinear function unit by taking advantage of the symmetry of the function.

Keywords/Search Tags:

LSTM, Configurable, Sparse and Quantization Compression, Hardware Accelerator, FPGA, High-Speed, Low-Power

PDF Full Text Request

Related items

1	FPGA-Based Design And Implementation Of Energy-Efficienct LSTM Prediction Accelerator
2	Operation Unit Optimization Of LSTM Hardware Accelerator
3	Research On LSTM Compression And FPGA Acceleration
4	Research And Design Of High - Speed Configurable FPGA I / O
5	A Research About Fitting Method In Hardware With High Speed Performance And Its Implementation In FPGA
6	Research And Implementation Of Graphical Reconfigurable Method Based On Hardware Logic
7	The Processor, Based 2FFT's High-speed Configurable FPGA Implementation Study
8	Implementation And Application Of Hardware Accelerator Based On Image Recognition Technology
9	Hardware Accelerator Design Of Convolutional Neural Networks For Low Power And High Performance
10	Research On High-speed Low-power SAR A/D Converters In CMOS Process