Font Size: a A A

Research And Implementation Of FPGA Accelerating Compressed Convolutional Neural Network

Posted on:2020-12-06Degree:MasterType:Thesis
Country:ChinaCandidate:L P LiFull Text:PDF
GTID:2428330602952365Subject:Engineering
Abstract/Summary:PDF Full Text Request
Field programmable gate array(FPGA)is widely considered as a promising platform for the convolutional neural network(CNN)acceleration due to its high parallelism,high energy efficiency,rich computing resources and flexible configuration.However,the large numbers of parameters of CNNs cause heavy computing and memory burdens for FPGA-based CNN implementation.In addition,most implementations treat the algorithm as a black box,focusing only on optimizing the hardware architecture while ignoring the algorithm improvement,making it difficult to deploy such bulky models on embedded system such as phones,drones and tablets with limited hardware resources and tight power budgets.From the perspective of both software and hardware co-design,this paper combines algorithm optimization and hardware architecture,compressing the CNN model and accelerating compressed CNN with FPGA.On the model compression side,the paper proposes reverse-pruning and peak-pruning strategies to significantly reduce the number of model parameters of trained CNN model and the amount of calculation without affecting the accuracy.Then,the CNN model after pruning is quantified for further compression,and an efficient data storage method is proposed for the convolutional layer and the fully connected layer of the CNN,which greatly reduces the extra cache overhead.From the FPGA acceleration perspective,the paper adopts the Zynq Ultra Scale+ MPSo C series FPGA chip as the platform core,in which the PS(Processing System)end of Zynq is the control center to realize the fully connected layer and the Softmax function of output layer of the CNN,and the PL(Programmable Logic)end is the FPGA.As the acceleration core,it is responsible for the convolutional layer and the pooling layer.In order to verify the effectiveness of the model compression strategy,this paper takes Alex Net as an example to compress the model.The Xilinx FPGA development kit and the ZCU104 development board are used to design and implement the compressed Alex Net.After testing and analysis,the model compression strategy proposed in this paper can significantly reduce the size of Alex Net by 28×,from 243 MB to 8.7 MB.In addition,the overall performance of our FPGA acceleration framework achieves 9.73 FPS(Frame per Second)for the compressed Alex Net.Compared with the central processing unit(CPU)and graphics processing unit(GPU)platforms,our implementation achieves 182.3× and 1.1× improvements in latency and throughput,respectively,on the convolutional(CONV)layers of Alex Net,with an 822.0× and 15.8× improvement for energy efficiency,separately.This novel compression strategy provides a reference for recurrent neural networks(RNNs),generative adversarial nets(GAN)and other neural network applications.
Keywords/Search Tags:Field Programmable Gate Array, Energy Efficiency, Convolutional Neural Network, software and hardware co-design, Model Compression
PDF Full Text Request
Related items