Font Size: a A A

Acceleration Method Research On CNN Related Object Detection Algorithm Based On OpenCL

Posted on:2020-10-01Degree:MasterType:Thesis
Country:ChinaCandidate:X Y WangFull Text:PDF
GTID:2428330575494850Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the rapid development of deep learning technology,the CNN-based object detection algorithm has achieved remarkable achievements,even meeting the needs of practical applications.However,convolutional neural networks with extremely high computational complexity do not meet the real-time requirements of running on the CPUs.Thus,the GPU is used to accelerate the training and testing of convolutional neural networks,but its high energy consumption cannot meet the needs of embedded applications.As a reconfigurable logic device,FPGA has a distinct advantage in edge-end application deployment with low power consumption.At the same time,its low latency feature also makes it ideal for performing cloud streaming tasks.Therefore,FPGA-based CNN accelerator design has become hot.However,research on FPGA accelerators for object detection applications is still relatively rare.This thesis designs a scalable FPGA accelerator for CNN-related object detection algorithm based on the OpenCL.The architecture can efficiently implement the hardware acceleration on YOLOv2 algorithm,and has good portability to different CNN models and devices.Among them,the deep pipeline formed by cascading multiple kernel can effectively alleviate the bandwidth pressure;the three parallelisms are designed to meet the requirements of high computationally intensive tasks;the data buffer design based on the folded line buffer can support the implementation of high throughput on architecture.In addition,this thesis proposes a series of improved methods for CNN-related object detection algorithms based on the idea of hardware design.Among them,the full 8-bit fixed-point quantization,and the layer fusion technology for convolution,batch normalization and activation functions,all greatly alleviate the bandwidth pressure;the reasonable adjustment of the YOLOv2 network structure enables parallel execution of some layers.Finally,this thesis implements a complete design space exploration process.The peak performance of the architecture on the target board can be obtained by combining the theoretical model of performance,bandwidth and resource requirements with the method proposed in this thesis,which provides convenience for rapid deployment across devices.This thesis completed the design of object detection system for real-time video streaming with the proposed YOLOv2 accelerator.The experiments were performed on the Intel Arria 10 GX1150 FPGA Development Board.The YOLOv2 network with input image of multiple resolution and Tiny YOLOv2 with input image of 416 were tested.The YOLOv2 with input image of 288 and the Tiny YOLOv2 achieve real-time speeds of 35FPS and 71FPS,respectively.Compared with the existing FPGA-based object detection acceleration scheme,the proposes architecture has two advantages.One is to achieve higher throughput,where the YOLOv2 with input image of 416 has a peak performance of 566 GOP.The other is to maintain higher accuracy,where the precision loss of the YOLOv2 is less than 1%.
Keywords/Search Tags:Object Detection, FPGA, YOLOv2, Deep Learning, Heterogeneous computing
PDF Full Text Request
Related items