VLSI Architecture Design For Binary Convolutional Neural Network Accelerator

Posted on:2021-01-14

Degree:Master

Type:Thesis

Country:China

Candidate:B C Liu

Full Text:PDF

GTID:2428330602497449

Subject:Electronic Science and Technology

Abstract/Summary:

PDF Full Text Request

Convolutional Neural Networks(CNN)have been widely used in image classification,and the network size is gradually increasing.As a result,the multiplier resources are challenging to meet the high parallelism of CNN acceleration,while the on-chip memory resources are challenging to meet the increasing memory requirements of the floating-point weight parameters.Binary CNN(BCNN)is a quantized CNN,and its weights are+1 or-1.Therefore,it would avoid multiplication during the convolution calculations.Besides,binary weights can significantly reduce the weight memory space requirement.This thesis makes full use of the characteristics of the BCNN and designs the low power consumption,highly parallel,and highly efficient VLSI architecture for accelerating BCNNs.After synthesis and implementation,the accelerator will be programmed in the device for the board-level verification.The main contributions of this work are as follows.(1)This thesis designs a binarize coding activation function,which can use XNOR and comparison to replace the middle BN layer multiplication while in the feedforward pass.And the number of multiplications will decrease.Besides,the binarize coding activation function can quantify the intermediate feature map of the full binary convolution layer as integers.As a result,the memory of the intermediate feature map will decrease.(2)Based on the binary images,we design a BNET-6 accelerator,which uses the systolic pipeline and inter-layer pipeline to increase the parallelism.The intermediate feature map memory requirement is reduced by 72%while using the binarize coding activation function.The FPS is 23080@28�28 while working at 120 MHz.The inference accuracy of the MNIST test set is reduced by 0.13%.The on-chip power consumption is 0.67 W,and the GOPS/W is 332.3,according to the implementation on the VC707 FPGA chip.The GOPS/W is improved by 11%while comparing it with the advanced BNN accelerator that classifies the MNIST images.(3)Based on the floating-point images,we design a BNET-12 accelerator,which uses inter-layer pipeline to improve the parallelism.The intermediate feature map memory requirement is reduced by 50%while using the binarize coding activation function.And the intermediate feature map memory requirement has a furthermore 48%reduction while applying the inter-layer pipeline.The FPS is 9230@3�32�32 while working at 120 MHz.The inference accuracy of the SVHN test set is reduced by 0.17%,while the inference accuracy of the Cifar1O test set is reduced by 0.56%.The on-chip power consumption is 4.9 W,and the FPS/W is 1883.7,according to the implementation on the VC707 FPGA chip.The FPS/W is improved 1.5 times,compared to the advanced BNN accelerator,which classifies the Cifar10 images.(4)We design a BNET-5 accelerator,which contains a 7�22 reconfigurable systolic array.And the 7�22 reconfigurable systolic array can be configured to complete the 7�7 convolution,5�5 convolution,3�3 convolution,and fully connected calculation.The intermediate feature map memory requirement is reduced by 71%while using the binarize coding activation function.The FPS is 6700@28�28 while working at 120 MHz.There is no inference loss while testing the MNIST test set.The on-chip power consumption is 0.51 W,the GOPS/W is 41.0,and the FPS/W is 13100,according to the implementation on the VC707 FPGA chip.The FPS/W is improved 5.5 times compared with the 16-fixed CNN accelerator that classifies the MNIST images,whose network size is close to the BENT-5.

Keywords/Search Tags:

Binary Convolutional Neural Network, Systolic Pipeline, Inter-Layer Pipeline, Reconfigurable Array, Image Classification

PDF Full Text Request

Related items

1	Chip Design Of Convolutional Neural Networks Algorithm Based On Reconfigurable Computing Platform
2	Study On The Multi-Pipeline Reconfigurable Computing System
3	Research On Key Technologies Of High Performance Accelerator For Convolution And Recurrent Neural Networks
4	The Applied Research On Graphics&Image Processing And Fault Classification Of The In-service Pipeline
5	Research On Pipeline Detection Technology Based On Pipeline Robot
6	Reconfigurable Architecture Based On Pipeline Reconfigurable Algorithm And The Aes Algorithm
7	Research On Parallelization Of Deep Convolutional Neural Network Based On Software Pipeline Technology
8	Research On Defect Recognition Method Of Pipeline Leakage Detection In Pipeline Based On Artificial Intelligence
9	Research And Design On Reconfigurable Pipelined Array System
10	The Design And Implementation Of Urban Underground Pipeline Information System Based On ArcEngin