Font Size: a A A

Target Detection Accelerator Design Based On FPGA

Posted on:2023-06-01Degree:MasterType:Thesis
Country:ChinaCandidate:Z GaoFull Text:PDF
GTID:2568307025976899Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
In recent years,Convolutional Neural Networks(CNNs)based object detection algorithms have been widely used in many fields such as industry,agriculture and military.For adjusting complexity application scenarios,the network structure of algorithms gradually complexed,and the number of network parameters and calculations increased rapidly,which increase the difficulty of deploying object detection algorithms into low-power embedded devices.Therefore,this dissertation adopts a programmable and energy-efficient Field Programmable Gate Array(FPGA)to design a low-power,high-throughput object detection algorithms hardware accelerator for implementing efficient object detection system in embedded devices.Firstly,this dissertation conducts the system research of object detection accelerators.The dissertation analyses the number of parameters,structure and speed of several object detection models by conducting the Roofline model of Xilinx ZCU104 platform.According to the estimated computational efficiency of each algorithm on the platform,the dissertation selects YOLO v2 to finish the design of accelerator.Furthermore,the dissertation optimizes the YOLO v2 algorithm for FPGAs,simplify the forward inference process of the algorithm by layer fusion,and quantize the network parameters by using dynamic fixed-point representation to alleviate the computational and storage pressure caused by floating-point data and operations.The dissertation conducts a study on convolutional loop optimization,and determine the loop unfolding method of fusing input feature map channels with output feature map channels and the loop exchange method of reusing output feature map data,which improves the parallelism and computational efficiency of the target detection system.Finally,the dissertation designed a hardware/software co-processing mechanism and system architecture based on on-chip heterogeneous computing.The computation of different layers is conducted by ARM or FPGA according to their advantage.And the data path of accelerator system is designed.Secondly,an efficient and universal accelerator IP core is designed to accelerate arbitrary computational layers of YOLO v2.Specifically,the dissertation exploits the methods of loop unrolling,loop blocking,ping-pong buffering and multi-channel data transmission to design the functional modules of accelerator IP core.Therefor the data transmission and delay are optimized,and the throughput of the accelerator is improved.Finally,this dissertation leverages high-level synthesis tools to optimize the implementation of the accelerator IP core.And completing the Block Design of the object detection system in Vivado.Moreover,the bitstream,weight and other files are imported into the FPGA to implement the corresponding functions on the Programmable System(PS)and integrates a complete object detection system.Thirdly,this dissertation conducts experiments on the Xilinx ZCU104 platform to verify the correctness of the functions designed,and analyzes the power consumption and performance.The experimental results show that the object detection accelerator designed in this dissertation can achieve a throughput of 28.3 GOPS and an energy efficiency of 7.1 GOPS/W with a power consumption of 3.98 W.The energy efficiency is 108(23.48)times of the YOLO v2 operation using the CPU(GPU).The comparison with other related researches shows that the CNN accelerator designed in this dissertation achieved comparable results on throughput and power consumption,and meets the requirements of embedded platform application scenarios.
Keywords/Search Tags:FPGA, object detection, CNN, hardware accelerator
PDF Full Text Request
Related items