Research On LSTM Compression And FPGA Acceleration

Posted on:2021-04-28

Degree:Master

Type:Thesis

Country:China

Candidate:S Pang

Full Text:PDF

GTID:2518306050954549

Subject:Measuring and Testing Technology and Instruments

Abstract/Summary:

PDF Full Text Request

Recurrent neural network(RNN)networks have achieved good performance in many natural language processing tasks,such as machine translation,speech recognition,and text prediction applications.As a variant of RNN,Long Short-Term Memory(LSTM)adds three gate units to the RNN unit,making it more capable of long-term memory,suitable for solving complex learning problems.But it brings high computational complexity and massive storage requirements.With the increase of field programmable gate array(FPGA)computing power and flexibility,the research on FPGA accelerated neural network algorithms has received widespread attention.But the storage resources of FPGAs are relatively limited.Under the requirements of FPGA storage,the realization of LSTM with high performance and low energy consumption has important research significance.The LSTM usually occupies a large amount of storage space,it is difficult to store it in a limited FPGA's on-chip RAM(Random Access Memory)resources.In order to solve this problem,a compression strategy has been proposed in this thesis by studying the LSTM model,and implements an FPGA-based LSTM accelerator from the perspective of both software and hardware co-design.First,the weight distribution of the LSTM has been analyzed,and selects the weight matrix structured pruning algorithm to obtain a sparse matrix,which is easy for hardware acceleration.Secondly,mixed-precision quantization LSTM has been proposed,which can reduce the storage requirements of multi-layer LSTM networks to the greatest extent.Then combines the weight matrix structured pruning and mixed-precision quantization to maximize the compression of multi-layer LSTMs.Thirdly,according to the characteristics of the compressed sparse matrix,this thesis optimizes the CSC storage format,and further reduces the storage space occupation.Finally,based on Xilinx's Zynq series FPGA,this thesis implements a multi-layer LSTM accelerator.The PS(Processing System)part of Zynq implements the data pre-processing function and Softmax function,and the PL(Processing Logic)part implements the LSTM inference calculation part.The effectiveness of the compression strategy proposed in this thesis has been verified through handwritten letter recognition experiment and language model experiment.In the language model experiment,the model is compressed by 53.3 times using proposed compression strategy,and the perplexity is same as the model is compressed by 40 times using traditional compression method.In the handwritten letter recognition experiment,the model is compressed by 42.6 times using proposed compression strategy,and the accuracy is same as the model is compressed by 32 times using traditional compression method.Finally,this thesis implements a handwritten letter recognition experiment on Xilinx's Zynq 7020,tests the energy efficiency of the designed LSTM accelerator.And compares it with CPU and GPU platforms that operating the same experiments.Experimental results show that the LSTM accelerator has 387.72 times and 8.41 times better energy efficiency than CPU and GPU.The design method in this thesis can provide reference for other types of neural networks such as Convolutional Neural Network(CNN),Gate Recurrent Unit(GRU)and so on.

Keywords/Search Tags:

LSTM, Weight matrix structured pruning, Mixed-precision quantization, FPGA

PDF Full Text Request

Related items

1	Research And Application Of Structured Model Compression Algorithm In Deep Neural Network
2	Pruning-Based Compression Method For Convolutional Neural Network
3	Research On FPGA Acceleration Of Neural Network Algorithm
4	Research On Acceleration Of Low-Precision Convolutional Neural Networks On FPGA
5	Mixed-precision Quantization Methods For Convolutional Neural Network Compression
6	Research And Application Of Model Compression Algorithm Based On Pruning-quantization-knowledge Distillation
7	Design Of Mixed Precision Neural Network Processor Based On FPGA
8	Study Of Low Bit-width Quantization Of Deep Convolutional Neural Network
9	Research Of Receiving Technologies For Communication Systems With Low-precision Quantization
10	The Study Of Pruning Methods Of Deep Neural Network