Font Size: a A A

Design And Implementation Of A Reconfigurable Convolutional Neural Network Accelerator Based On FPGA

Posted on:2022-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:T S ChenFull Text:PDF
GTID:2518306539968619Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
In recent years,Convolutional Neutral Network(CNN)has been widely used in many fields such as computer vision,speech recognition and document analysis.Although CNN has shown excellent performance in many application scenarios,it comes at the cost of high computational complexity.In many application scenarios,CNN needs to perform forward inference on embedded platform for the purpose of real-time,data security and so on.Because many embedded platforms have strict power consumption,computing,and memory cost constraints,efficient processing of forward inference is essential.Field Programmable Gate Array(FPGA),as a kind of semi-customized hardware circuit,has the characteristics of flexible design,high performance and power consumption ratio,and has gradually become the research hotspot of CNN hardware acceleration.Accelerators designed for a CNN can easily achieve the full computational throughput of FPGA.However,this kind of accelerator can only run the designed CNN,or the performance is not high when running other networks,so it is of great significance to design a reconfigurable CNN accelerator.Focusing on the reconfigurable hardware architecture,performance,power consumption,and other aspects,this paper will develop the design and implementation of the reconfigurable CNN accelerator based on FPGA.The main work of this paper is as follows:1)Analyze of the convolutional computing characteristics of CNN,and design a reconfigurable computing clusters and a reconfigurable on-chip buffer.The reconfigurable computing cluster can support functions such as convolution calculation and nonlinear activation function.The reconfigurable on-chip buffer can perform zero padding of the input feature map,overlap of feature map tiling and perform data transmission in a certain order.The five-stage pipeline structure of the reconfigurable computing cluster can fully reuse the DSP resources of the FPGA,which effectively improves the computing power of the accelerator.The reconfigurable on-chip cache makes full use of the data transmission characteristics of DMA(Direct Memory Access)to improve the efficiency of data transmission.2)Based on the designed hardware accelerator,a calculation method for finding the optimal feature map tiling parameters is proposed.This calculation method can evaluate the calculation performance and data transmission bandwidth of the accelerator according to the size of different convolutional layers,and find the optimal feature map tiling parameters,thereby realizing the optimal performance of the accelerator.3)This paper selects three CNNs of VGG16,Res Net50 and YOLOv2-tiny as the test benchmarks,and quantifies the network to 16-bit fixed point without fine-tuning the network.Among them,the quantization errors of the Top-1 and Top-5 accuracy rates of VGG16 and Res Net50 are both less than 3%;the quantization error of the mean average accuracy of YOLOv2-tiny is less than 3%,and the recall rate is reduced by less than 1%.4)The FPGA-based reconfigurable accelerator completes the design verification on the Xilinx Zynq ZC706 evaluation board.At a clock frequency of 200 MHz,the real-time performance of VGG16,Res Net50 and YOLOv2-tiny reached 163.0 GOPS,107.9 GOPS and121.2 GOPS,respectively.The FPGA chip power are 7.6W,6.8W,6.7W,and the evaluation board power are 20.4W,19.8W,and 19.2W respectively.
Keywords/Search Tags:FPGA, convolutional neural network, reconfigurable, accelerator
PDF Full Text Request
Related items