Research On Key Technology Of Sparse Recurrent Neural Network Customized Accelerator

Posted on:2023-10-01

Degree:Master

Type:Thesis

Country:China

Candidate:T Xiao

Full Text:PDF

GTID:2558307169982459

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Recurrent neural networks(RNNs)has been widely used in the fields of Automatic Speech Recognition(ASR)and Natural Language Processing(NLP).In order to accelerate RNN inference,previous works have proposed various optimization methods for RNN.Weight pruning is a widely used optimization strategy,which speeds up the RNN by constructing sparse weight matrices and omitting the calculation and storage of zero elements in the inference process.However,the non-zero elements of the weight matrices after unstructured pruning are randomly distributed,which will lead to unbalanced calculation and memory conflicts and thus fail to reach the ideal speed-up ratio.Bank-Balanced Sparsity(BBS)is an efficient compression method that has a balanced distribution of non-zero elements and negligible precision degradation.However,it costs considerable additional memory overhead to store the indices,which limits the compression ratio.Some works also proposed using the input similarity of time series analyze task to reduce computation,but the similarity check algorithms of these works have the disadvantages of high complexity or large error accumulation,and have not used the sparsity of weight at the same time,so there is a lot of space for optimization.This paper presentes an acceleration scheme of circular neural network that combining balanced sparse and input similarity-based skipping.For weight,a Shared Index Bank-Balanced Sparsity(SIBBS)compression method is presented in this paper.The rows of a weight matrix are divided into multiple bank clusters to balance the non-zero weight distribution.The banks in one cluster share the indices.Compared with the BBS,the cost of the index is reduced by 2-8x,and the accuracy only decreases by 0.9% on Libri Speech and 0.4% on TIMIT.Meanwhile,A coarse-grained inputs similarity skipping algorithm,fixed input similarity-based skipping algorithm,is proposed at the same time to utilize the SIBBS pruning balance.The fixed input similarity-based skipping algorithm compares the similarity between current input and the first input after skipping failure.This method has smaller error accumulation compared with the algorithm based on the similarity of adjacent inputs.In addition,the similarity calculation formula proposed in this paper has the same precision as other formulas but greatly reduces the computational complexity.While reducing LSTM operation by 10% using this algorithm,the accuracy decreases by 0.55%-1.90%on the Libri Speech test set and 0.42%-0.88%on the TIMIT test set,with negligible computational overhead.Finally,an accelerator architecture using SIBBS and fixed input similarity-based skipping algorithm is proposed in this paper.The accelerator includes a sparse matrix vector multiplication unit and a similarity check unit to execute the two algorithms respectively,which can reduce the computational from both the weight matrix and input vector.This accelerator has implemented on Xilinx XCKU115 FPGA.Compared with the most advanced LSTM accelerators based on FPGA,the proposed accelerators achieve a 1.47x-79.5x reduction in latency without accuracy loss.When performing continuous LSTM calculations,the average latency will be lower due to the input similarity-based skipping algorithm.

Keywords/Search Tags:

LSTM, accelerator, Model Compression, FPGA

PDF Full Text Request

Related items

1	Design And Optimization Of Configurable Hardware Accelerator For LSTM Neural Network
2	FPGA-Based Design And Implementation Of Energy-Efficienct LSTM Prediction Accelerator
3	Research On Memory Bus Width Aware Compression Technology Of Image Super-resolution Model Algorithm Based On FPGA
4	A Neural Network Accelerator Based On FPGA
5	Operation Unit Optimization Of LSTM Hardware Accelerator
6	Algorithm Of SVD Compressing Convolutional Neural Networks And Hardware Accelerator Design
7	Research On LSTM Compression And FPGA Acceleration
8	Study On LSTM Model Compression And Hardware Acceleration Based On FPGA
9	PCIe Data Compression Accelerator Card Based On FPGA
10	Research On Image Recognition Acceleration Method Based On FPGA Hardware And Software Collaboration