Design Of BNN Hardware Accelerator Based On Pre-Calculation

Posted on:2022-05-27

Degree:Master

Type:Thesis

Country:China

Candidate:B Y Chen

Full Text:PDF

GTID:2518306560480044

Subject:Microelectronics and Solid State Electronics

Abstract/Summary:

PDF Full Text Request

In recent years,Convolutional Neural Networks(CNNs)have become more and more widely used in our lives.The huge amount of data and calculations in the CNN model has gradually become the main factor hindering the development of neural networks.As a lightweight neural network,Binary Neural Networks(BNN)can greatly reduce the amount of data and calculation compared with CNN.Based on the advantages of BNN,the design of dedicated BNN hardware accelerator has become a new hot research direction.However,there are still the following challenges in the researchs of BNN hardware accelerator field:(1)Full-precision input images cause additional resource overhead;(2)Traditional 0 value edge padding method leads to data tri-valued;(3)There are a lot of redundant calculations in the convolutional layer.In view of the above-mentioned challenges,this paper proposed a high-performance BNN hardware accelerator based on FPGA.The main tasks include:(1)A fully binarization method is proposedIn the training stage of BNN,the data in the full-precision dataset are separated bitwise to obtain the binarization dataset.Based on the obtained binarization dataset,the network model with binarized input is trained to avoid the waste of resources caused by the full-precision input image.In the inference stage,the odd-even channel cross-padding method is used to achieve the same effect as the traditional 0 value padding,so as to solve the problem of trivalzation of data.After processing by the above two methods,a fully binarized neural network is obtained.(2)This dissertation proposed a row convolution lookup table(LUT)method based on reuse in the delivery processDue to the high repetition rate of the weight data,there are a lot of redundant calculations in the convolutional layer,we adopt a row convolution LUT method to skip the redundant calculations.And in the movement of the convolution kernel,the row convolution LUT corresponding to the repeated input data is reused to further reduce the amount of calculation.In addition,a ping-pong operation is performed on the writing and reading of the LUT,thereby maximizing the utilization of computing resources,thereby reducing the computing cycles.Experimental results show that compared with the existing optimization scheme,the proposed LUT method can reduce the computation amount by 11.72% and the computation cycles by 59.48%.(3)A high-performance configurable fully binary neural network hardware accelerator is proposed.Based on the odd-even channel cross-padding method and the row convolution LUT method,a set of configurable binary calculation array is designed.And the calculation of each layer is controlled by the layer configuration chain,so as to improve resource utilization.In addition,in order to reduce on-chip BRAM resources and the amount of BRAM data accessing,we adopt a dataflow that combines row convolution LUT with output stationary.The experimental results show that the designed accelerator can achieve a resource conversion efficiency of 144.2GOPS/KLUT and a power conversion efficiency of 3507 GOPS/W.

Keywords/Search Tags:

Binary Neural Network, Row Convolution LUT, Hardware Accelerator

PDF Full Text Request

Related items

1	Design And Implementation Of A Hardware Accelerator For Binary Neural Networks
2	Programmable Manycore Accelerator for Machine Learning, Convolution Neural Network and Binary Neural Networ
3	Research On Software And Hardware Optimization Of Convolution Operation In Neural Network
4	Implementation And Application Of Hardware Accelerator Based On Image Recognition Technology
5	Design And Implimentation Of Energy-Efficient Binary Neural Network Accelerator
6	Research On The Convolution Neural Network Accelerator For Image Recognition
7	Design And Optimization Of Convolution Array Accelerator Based On FPGA
8	Design Of Neural Network Accelerator For Portable Applications
9	Convolution Neural Network Accelerator For General DSP
10	Design Of Binary Neural Network Memristive Accelerator