Design Of FPGA Convolution Neural Network Accelerator Based On HLS

Posted on:2022-06-06

Degree:Master

Type:Thesis

Country:China

Candidate:L Huang

Full Text:PDF

GTID:2568306323470704

Subject:Circuits and Systems

Abstract/Summary:

Convolutional neural network(CNN)has been widely used in the field of computer vision.However,CNN model has the characteristics of computational-intensive and memory-intensive,and it is challenging to implement CNN models efficiently on embedded devices with relatively limited resources.In this paper,the corresponding acceleration structure developed by HLS tool is proposed.Our works include:(1)Aiming at the cumbersome and complicated development process of developing FPGA-based CNN accelerators using hardware description language,an efficient FPGA accelerator based on HLS tool is proposed in this paper which can support the acceleration of multiple CNN models.Three representative CNN networks were selected:VGG-16,ResNet-18 and SqueezeNet for acceleration.(2)The performance of CNN accelerators based on HLS development has been improved by multi-level optimization.The loop unrolling and tiling optimization are applied.The input data reuse technology based on loop interchange is adopted to avoid repeated external memory access.The FIFO-based convolution input data bus is used to reuse the input data,which optimizes the efficiency of memory access and the reusability of data.Two Fully Pipelined processing Units(FPU):Data-Fetching Unit(DFU),Calculation and Accumulation Unit(CAU)are employed in convolution module to improve the efficiency of calculation.In the convolution module,the pipeline implementation based on the ping-pong buffer is adopted.Three parts of the convolution module:data input,computing and storage can be started simultaneously without mutual dependence,which makes the computation time overlap with the data transfer overhead between DRAM to the on-chip buffers of FPGA.Reuse input data to reduce redundant data transmission for the branching features of the ResNet network.(3)In the overall architecture of the system,the efficiency of data transmission is improved by a streaming structure based on multiple DMA channels.In terms of data form,fixed-point data style is used to optimize calculation and storage efficiency.Acceleration of various convolutional neural network models is implemented on the Xilinx ZC706 platform.Experimental data shows that the ResNet-18 model can achieve a throughput of 227.8 GOPS on the accelerator,and the VGG-16 model can achieve a throughput of 162.9 GOPS.The ResNet-18 model achieved a resource efficiency of 3.08 GOPS/kLUT and the energy efficiency reached 167.5 GOPS/W.

Keywords/Search Tags:

CNN accelerator, HLS, Fully-pipelined, Ping-Pong buffer

Related items

1	A Convolutional Neural Networks Accelerator Based On Parallel Memory Technology
2	On-board Vision System For Humanoid Ping-Pong Robot
3	Research On The Control System And Hit Strategy Of Ping Pong Robot
4	The Design Of High-speed Data Transmission System Based On PCI
5	The Research On The Trajectory Prdiction And Classification Of The Table Tennis Trajectory
6	On-board Vision System For Humanoid Ping-pong Robot Without Marks And Trajectory Prediction
7	Flame Monitoring System Based On The Dm642 Embedded Forest Areas For Study
8	Video Image Processor System Software Research
9	Study On The Two-ear Positioning Model Based On Bats
10	Modeling Research On Trajectory Of Ping-pong Based On Binocular Vision