Acceleration Method Research On CNN Related Object Detection Algorithm Based On OpenCL

Posted on:2020-10-01

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Wang

Full Text:PDF

GTID:2428330575494850

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

With the rapid development of deep learning technology,the CNN-based object detection algorithm has achieved remarkable achievements,even meeting the needs of practical applications.However,convolutional neural networks with extremely high computational complexity do not meet the real-time requirements of running on the CPUs.Thus,the GPU is used to accelerate the training and testing of convolutional neural networks,but its high energy consumption cannot meet the needs of embedded applications.As a reconfigurable logic device,FPGA has a distinct advantage in edge-end application deployment with low power consumption.At the same time,its low latency feature also makes it ideal for performing cloud streaming tasks.Therefore,FPGA-based CNN accelerator design has become hot.However,research on FPGA accelerators for object detection applications is still relatively rare.This thesis designs a scalable FPGA accelerator for CNN-related object detection algorithm based on the OpenCL.The architecture can efficiently implement the hardware acceleration on YOLOv2 algorithm,and has good portability to different CNN models and devices.Among them,the deep pipeline formed by cascading multiple kernel can effectively alleviate the bandwidth pressure;the three parallelisms are designed to meet the requirements of high computationally intensive tasks;the data buffer design based on the folded line buffer can support the implementation of high throughput on architecture.In addition,this thesis proposes a series of improved methods for CNN-related object detection algorithms based on the idea of hardware design.Among them,the full 8-bit fixed-point quantization,and the layer fusion technology for convolution,batch normalization and activation functions,all greatly alleviate the bandwidth pressure;the reasonable adjustment of the YOLOv2 network structure enables parallel execution of some layers.Finally,this thesis implements a complete design space exploration process.The peak performance of the architecture on the target board can be obtained by combining the theoretical model of performance,bandwidth and resource requirements with the method proposed in this thesis,which provides convenience for rapid deployment across devices.This thesis completed the design of object detection system for real-time video streaming with the proposed YOLOv2 accelerator.The experiments were performed on the Intel Arria 10 GX1150 FPGA Development Board.The YOLOv2 network with input image of multiple resolution and Tiny YOLOv2 with input image of 416 were tested.The YOLOv2 with input image of 288 and the Tiny YOLOv2 achieve real-time speeds of 35FPS and 71FPS,respectively.Compared with the existing FPGA-based object detection acceleration scheme,the proposes architecture has two advantages.One is to achieve higher throughput,where the YOLOv2 with input image of 416 has a peak performance of 566 GOP.The other is to maintain higher accuracy,where the precision loss of the YOLOv2 is less than 1%.

Keywords/Search Tags:

Object Detection, FPGA, YOLOv2, Deep Learning, Heterogeneous computing

PDF Full Text Request

Related items

1	Design And Implementation Of Deep Learning Acceleration System For Embedded Applications
2	Research And Implementation Of YOLOv2 Network Based On FPGA
3	Design And Implementation Of Faster R-CNN Accelerator Based On Heterogeneous Processor
4	Research On Dense Object Detection Based On Deep Learning
5	Research On Remaining Object Detection Algorithm Based On Improved YOLOv2 Network
6	Research On Real-time Object Detection Based On YOLOv2
7	Research On Steel Stamping Character Recognition Based On Deep Learning YOLOv2 Algorithm
8	Acceleration And Implementation Of Object Detection Algorithm Based On FPGA
9	The Research And Implementation Of Deep Learning Heterogeneous Computing Platform Based On CPU And Multiple FPGA Architecture
10	Research On Real-Time Face Detection Technology Based On FPGA