Font Size: a A A

The Design And FPGA Verification Of CNN Accelerator Based On Group Pruning

Posted on:2021-06-18Degree:MasterType:Thesis
Country:ChinaCandidate:C L WuFull Text:PDF
GTID:2518306476960329Subject:IC Engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the development of deep learning and the enhancement of hardware computing power,convolutional neural networks have gradually become synonymous with high performance in computer vision,and playing an important role in the application of Io T and edge computing.However,the high performance of convolutional neural networks,built on high-intensity computing and a large number of parameters,poses a significant challenge for CNN deployment at the terminal.In order to popularize deep learning techniques,it is of great research value for network compression and accelerated operation of convolutional neural networks.In this thesis,from the perspective of mining the redundancy of parameters and operations,the algorithm and hardware accelerator are designed to remove the redundancy as much as possible to improve the efficiency of model inference.First,based on the implementation of the Group convolution and the computational form of the compute array,the group pruning algorithm is designed to improve the problem of deletion redundancy in the structured pruning.Second,to address the limitations of the L-2 regularization under the group pruning,the group sparse regularization was adopted to reduce accuracy loss while improving the pruning rate.Finally,the accelerator is designed for the sparse network after group pruning,and the model of access-memory is built under the array structure and the optimal design scale and computation pattern are explored.The sparse network after group pruning is further accelerated through the design of sparse computing element,register stack,functional layer,and system scheduling to optimize the accelerator.Under the pruning rate of 87.5%,the computation was reduced by 75.4%and 86.9%in LeNet-5 and VGG-16,and the GPU inference achieved a speedup of 2.53×and 2.15×with merely zero and 0.48%increase of error.The synthesis was completed under SMIC's 40nm process.At a frequency of 200MHz and a voltage of 1.1V,the total power consumption of the accelerator is 141.08m W and the core area is 1.867mm~2.The thesis tests the performance of the accelerator based on Xilinx VC707.It achieves 188.41GOPS at 100MHz and consumes only8.15W on VGG-16 benchmark.The power efficiency of the accelerator is 23.1GOPS/W.The design method of CNN hardware accelerator based on group pruning has important reference significance for the research and design of lightweight and low energy consumption of AI terminals.
Keywords/Search Tags:Deep learning, Convolution neural network acceleration, Network pruning, FPGA accelerator
PDF Full Text Request
Related items