Font Size: a A A

Research On LSTM Compression And FPGA Acceleration

Posted on:2021-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:S PangFull Text:PDF
GTID:2518306050954549Subject:Measuring and Testing Technology and Instruments
Abstract/Summary:PDF Full Text Request
Recurrent neural network(RNN)networks have achieved good performance in many natural language processing tasks,such as machine translation,speech recognition,and text prediction applications.As a variant of RNN,Long Short-Term Memory(LSTM)adds three gate units to the RNN unit,making it more capable of long-term memory,suitable for solving complex learning problems.But it brings high computational complexity and massive storage requirements.With the increase of field programmable gate array(FPGA)computing power and flexibility,the research on FPGA accelerated neural network algorithms has received widespread attention.But the storage resources of FPGAs are relatively limited.Under the requirements of FPGA storage,the realization of LSTM with high performance and low energy consumption has important research significance.The LSTM usually occupies a large amount of storage space,it is difficult to store it in a limited FPGA's on-chip RAM(Random Access Memory)resources.In order to solve this problem,a compression strategy has been proposed in this thesis by studying the LSTM model,and implements an FPGA-based LSTM accelerator from the perspective of both software and hardware co-design.First,the weight distribution of the LSTM has been analyzed,and selects the weight matrix structured pruning algorithm to obtain a sparse matrix,which is easy for hardware acceleration.Secondly,mixed-precision quantization LSTM has been proposed,which can reduce the storage requirements of multi-layer LSTM networks to the greatest extent.Then combines the weight matrix structured pruning and mixed-precision quantization to maximize the compression of multi-layer LSTMs.Thirdly,according to the characteristics of the compressed sparse matrix,this thesis optimizes the CSC storage format,and further reduces the storage space occupation.Finally,based on Xilinx's Zynq series FPGA,this thesis implements a multi-layer LSTM accelerator.The PS(Processing System)part of Zynq implements the data pre-processing function and Softmax function,and the PL(Processing Logic)part implements the LSTM inference calculation part.The effectiveness of the compression strategy proposed in this thesis has been verified through handwritten letter recognition experiment and language model experiment.In the language model experiment,the model is compressed by 53.3 times using proposed compression strategy,and the perplexity is same as the model is compressed by 40 times using traditional compression method.In the handwritten letter recognition experiment,the model is compressed by 42.6 times using proposed compression strategy,and the accuracy is same as the model is compressed by 32 times using traditional compression method.Finally,this thesis implements a handwritten letter recognition experiment on Xilinx's Zynq 7020,tests the energy efficiency of the designed LSTM accelerator.And compares it with CPU and GPU platforms that operating the same experiments.Experimental results show that the LSTM accelerator has 387.72 times and 8.41 times better energy efficiency than CPU and GPU.The design method in this thesis can provide reference for other types of neural networks such as Convolutional Neural Network(CNN),Gate Recurrent Unit(GRU)and so on.
Keywords/Search Tags:LSTM, Weight matrix structured pruning, Mixed-precision quantization, FPGA
PDF Full Text Request
Related items