Font Size: a A A

Design Of BNN Hardware Accelerator Based On Pre-Calculation

Posted on:2022-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:B Y ChenFull Text:PDF
GTID:2518306560480044Subject:Microelectronics and Solid State Electronics
Abstract/Summary:PDF Full Text Request
In recent years,Convolutional Neural Networks(CNNs)have become more and more widely used in our lives.The huge amount of data and calculations in the CNN model has gradually become the main factor hindering the development of neural networks.As a lightweight neural network,Binary Neural Networks(BNN)can greatly reduce the amount of data and calculation compared with CNN.Based on the advantages of BNN,the design of dedicated BNN hardware accelerator has become a new hot research direction.However,there are still the following challenges in the researchs of BNN hardware accelerator field:(1)Full-precision input images cause additional resource overhead;(2)Traditional 0 value edge padding method leads to data tri-valued;(3)There are a lot of redundant calculations in the convolutional layer.In view of the above-mentioned challenges,this paper proposed a high-performance BNN hardware accelerator based on FPGA.The main tasks include:(1)A fully binarization method is proposedIn the training stage of BNN,the data in the full-precision dataset are separated bitwise to obtain the binarization dataset.Based on the obtained binarization dataset,the network model with binarized input is trained to avoid the waste of resources caused by the full-precision input image.In the inference stage,the odd-even channel cross-padding method is used to achieve the same effect as the traditional 0 value padding,so as to solve the problem of trivalzation of data.After processing by the above two methods,a fully binarized neural network is obtained.(2)This dissertation proposed a row convolution lookup table(LUT)method based on reuse in the delivery processDue to the high repetition rate of the weight data,there are a lot of redundant calculations in the convolutional layer,we adopt a row convolution LUT method to skip the redundant calculations.And in the movement of the convolution kernel,the row convolution LUT corresponding to the repeated input data is reused to further reduce the amount of calculation.In addition,a ping-pong operation is performed on the writing and reading of the LUT,thereby maximizing the utilization of computing resources,thereby reducing the computing cycles.Experimental results show that compared with the existing optimization scheme,the proposed LUT method can reduce the computation amount by 11.72% and the computation cycles by 59.48%.(3)A high-performance configurable fully binary neural network hardware accelerator is proposed.Based on the odd-even channel cross-padding method and the row convolution LUT method,a set of configurable binary calculation array is designed.And the calculation of each layer is controlled by the layer configuration chain,so as to improve resource utilization.In addition,in order to reduce on-chip BRAM resources and the amount of BRAM data accessing,we adopt a dataflow that combines row convolution LUT with output stationary.The experimental results show that the designed accelerator can achieve a resource conversion efficiency of 144.2GOPS/KLUT and a power conversion efficiency of 3507 GOPS/W.
Keywords/Search Tags:Binary Neural Network, Row Convolution LUT, Hardware Accelerator
PDF Full Text Request
Related items