Research And Implementation Of FPGA Accelerating Compressed Convolutional Neural Network

Posted on:2020-12-06

Degree:Master

Type:Thesis

Country:China

Candidate:L P Li

Full Text:PDF

GTID:2428330602952365

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Field programmable gate array(FPGA)is widely considered as a promising platform for the convolutional neural network(CNN)acceleration due to its high parallelism,high energy efficiency,rich computing resources and flexible configuration.However,the large numbers of parameters of CNNs cause heavy computing and memory burdens for FPGA-based CNN implementation.In addition,most implementations treat the algorithm as a black box,focusing only on optimizing the hardware architecture while ignoring the algorithm improvement,making it difficult to deploy such bulky models on embedded system such as phones,drones and tablets with limited hardware resources and tight power budgets.From the perspective of both software and hardware co-design,this paper combines algorithm optimization and hardware architecture,compressing the CNN model and accelerating compressed CNN with FPGA.On the model compression side,the paper proposes reverse-pruning and peak-pruning strategies to significantly reduce the number of model parameters of trained CNN model and the amount of calculation without affecting the accuracy.Then,the CNN model after pruning is quantified for further compression,and an efficient data storage method is proposed for the convolutional layer and the fully connected layer of the CNN,which greatly reduces the extra cache overhead.From the FPGA acceleration perspective,the paper adopts the Zynq Ultra Scale+ MPSo C series FPGA chip as the platform core,in which the PS(Processing System)end of Zynq is the control center to realize the fully connected layer and the Softmax function of output layer of the CNN,and the PL(Programmable Logic)end is the FPGA.As the acceleration core,it is responsible for the convolutional layer and the pooling layer.In order to verify the effectiveness of the model compression strategy,this paper takes Alex Net as an example to compress the model.The Xilinx FPGA development kit and the ZCU104 development board are used to design and implement the compressed Alex Net.After testing and analysis,the model compression strategy proposed in this paper can significantly reduce the size of Alex Net by 28�,from 243 MB to 8.7 MB.In addition,the overall performance of our FPGA acceleration framework achieves 9.73 FPS(Frame per Second)for the compressed Alex Net.Compared with the central processing unit(CPU)and graphics processing unit(GPU)platforms,our implementation achieves 182.3� and 1.1� improvements in latency and throughput,respectively,on the convolutional(CONV)layers of Alex Net,with an 822.0� and 15.8� improvement for energy efficiency,separately.This novel compression strategy provides a reference for recurrent neural networks(RNNs),generative adversarial nets(GAN)and other neural network applications.

Keywords/Search Tags:

Field Programmable Gate Array, Energy Efficiency, Convolutional Neural Network, software and hardware co-design, Model Compression

PDF Full Text Request

Related items

1	ZYNQ-Based Reconfigurable Convolutional Neural Network Accelerator
2	Research On Binarization And FPGA Acceleration Of Convolutional Neural Network
3	Research And Design Of Convolutional Neural Network Accelerator Based On Multi-FPGA Co-acceleration
4	Design And Optimization Of Tiny YOLO Convolutional Neural Network Accelerator
5	Research And Implementation Of FPGA Accelerated Convolutional Neural Network Training
6	The Research And Implementation Of Convolutional Neural Network Based On FPGA
7	SRAM Field Programmable Gate Array Design And Test Analysis
8	FPGA Based Convolutional Neural Network Application Research
9	Software/Hardware Co-Design And Implementation Of 802.11 MAC Protocol Based On FPGA Technology
10	Hardware/Software Co-verification Solution Based On FPGA And ISS