Font Size: a A A

Hardware Acceleration And Implementation Of GoogLeNet Network Based On Sparse Convolution

Posted on:2021-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:Z H BaiFull Text:PDF
GTID:2428330614970926Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the development of artificial intelligence,convolutional neural network has become a hot research area.However,due to the high complexity of convolutional neural network,the traditional CPU can not meet the real-time requirements.Although GPU is widely used in network training,it can not meet the embedded application requirements because of its high power consumption.Therefore,FPGA with its low power consumption,reconfigurable and low delay characteristics is gradually called the research hotspot.At present,the traditional method of deploying convolutional neural network on FPGA is mainly to build a large-scale multiplication and accumulation array.The maximum performance of this method is limited by the number of multiplier units on FPGA and can not obtain higher performance by using the feature of parameter redundancy.Thisis innovatively use the following methods to solve these problems base on GoogLeNet:(1)Thisis proposes a multi-dimensional algorithm compression framework,which includes pruning,clustering and quantization to lighten the GoogLeNet to solve the problems of large computation and large parameters.Based on pruning rate and distribution of different convolution parameters in GoogLeNet,adjust pruning threshold dynamically and remove unimportant parameters.Thisis uses K-Means clustering algorithm to cluster convolution kernel weight parameters.And sets different clustering categories to achieve the best clustering effect according to the size of convolution kernel and the number of non-zero parameters.At last,thisis uses Ristretto algorithm to reduce the storage space of GoogLeNet by 8-bit quantization.The experimental results show that after using three compression methods,the storage space of the GoogLeNet network model is reduced to one tenth of the original model,and the calculation amount is reduced to one quarter of the original model.(2)Based on the OpenCL heterogeneous computing framework,and combining with the compressed GoogLeNet model and the ABM-Sp Conv sparse convolution algorithm proposed by the research group,thisis designs the hardware architecture for GoogLeNet.Thisis decouples the addition and multiplication into two stages.Firstly,the feature map data corresponding to the weight is accumulated in addition unit,and then multiply by weight in multiplication unit.It can reduce the amount of hard core in multiplier.Thisis proposes to fuse BN layer and convolution layer to reduce deployment difficulty and codes the weight parameter to solve the problem of low efficiency of memory access.At last,designs a complete design space exploration process.Through the theoretical modeling and analysis of resources,frequency and performance,gets the optimal performance of this architecture on the target board.This work deploys the GoogLeNet on Arria 10 GX FPGA and gets a excellent result.Under the optimal parameter configuration,takes 3.4ms to recognize a picture and the throughput is 1456 GOPS.Its energy efficiency ratio is 34 times CPU and 4 times GPU.This work compares with the previous optimal architecture,double the speed and three times more throughput.
Keywords/Search Tags:Sparse convolution, GoogLeNet, OpenCL, FPGA, Image Recognition, Heterogeneous Computing
PDF Full Text Request
Related items