The Design And FPGA Verification Of CNN Accelerator Based On Group Pruning

Posted on:2021-06-18

Degree:Master

Type:Thesis

Country:China

Candidate:C L Wu

Full Text:PDF

GTID:2518306476960329

Subject:IC Engineering

Abstract/Summary:

PDF Full Text Request

In recent years,with the development of deep learning and the enhancement of hardware computing power,convolutional neural networks have gradually become synonymous with high performance in computer vision,and playing an important role in the application of Io T and edge computing.However,the high performance of convolutional neural networks,built on high-intensity computing and a large number of parameters,poses a significant challenge for CNN deployment at the terminal.In order to popularize deep learning techniques,it is of great research value for network compression and accelerated operation of convolutional neural networks.In this thesis,from the perspective of mining the redundancy of parameters and operations,the algorithm and hardware accelerator are designed to remove the redundancy as much as possible to improve the efficiency of model inference.First,based on the implementation of the Group convolution and the computational form of the compute array,the group pruning algorithm is designed to improve the problem of deletion redundancy in the structured pruning.Second,to address the limitations of the L-2 regularization under the group pruning,the group sparse regularization was adopted to reduce accuracy loss while improving the pruning rate.Finally,the accelerator is designed for the sparse network after group pruning,and the model of access-memory is built under the array structure and the optimal design scale and computation pattern are explored.The sparse network after group pruning is further accelerated through the design of sparse computing element,register stack,functional layer,and system scheduling to optimize the accelerator.Under the pruning rate of 87.5%,the computation was reduced by 75.4%and 86.9%in LeNet-5 and VGG-16,and the GPU inference achieved a speedup of 2.53�and 2.15�with merely zero and 0.48%increase of error.The synthesis was completed under SMIC's 40nm process.At a frequency of 200MHz and a voltage of 1.1V,the total power consumption of the accelerator is 141.08m W and the core area is 1.867mm~2.The thesis tests the performance of the accelerator based on Xilinx VC707.It achieves 188.41GOPS at 100MHz and consumes only8.15W on VGG-16 benchmark.The power efficiency of the accelerator is 23.1GOPS/W.The design method of CNN hardware accelerator based on group pruning has important reference significance for the research and design of lightweight and low energy consumption of AI terminals.

Keywords/Search Tags:

Deep learning, Convolution neural network acceleration, Network pruning, FPGA accelerator

PDF Full Text Request

Related items

1	Research On Hardware Acceleration Technology Of Convolutional Neural Network And Implementation On FPGA
2	Hardware Acceleration Design And Research Based On FPGA And Deep Learning Algorithm
3	Research On CNN Network Acceleration For Image Classification Based On FPGA
4	Deep Neural Network Acceleration With Sparse Prediction Layers
5	Convolutional Neural Network Model Compression And Inference Acceleration Based On Look Up Table
6	Circuit Design Of DNN Accelerator For Structure Pruning Compression Algorithm
7	The Study Of Pruning Methods Of Deep Neural Network
8	Similarity-Based Approach To Neural Network Pruning
9	Study On Method And Implementation Of FPGA Based Acceleration For Convolution Neural Network
10	Design And Implementation Of Lightweight Convolutional Neural Network Accelerator On SoPC