Object Detection Algorithm Acceleration Based On OpenCL

Posted on:2019-02-18

Degree:Master

Type:Thesis

Country:China

Candidate:X Zhang

Full Text:PDF

GTID:2428330572952209

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Object detection has a wide range of applications in various fields of life and is an important direction of computer vision.It aims to determine whether there are single or multiple objects of interest in an image or video,and to determine the category and location of the object.The accuracy of the detection of the traditional object detection algorithm is limited by the feature extraction and the positioning error.The object detection algorithm based on the deep learning framework integrates the feature extraction with the object recognition and localization.So the object detection algorithm based on deep learning framework gradually occupies the market.We use OpenCL language to speed up the YOLO detection algorithm which is based on deep learning framework.The speed of train and test is significantly improved without the reduction of accuracy.The main work is as follows:(1)Use OpenCL language to accelerate the convolutional layer,the max pooling layer,the average pooling layer,batch normalization layer,Softmax layer,loss function and other layers in convolutional neural network.Under the condition of NVIDIA's graphics card GTX1080,the train process of YOLO achieves about 141 times speedup compared to the Intel CPU,and the test process achieves about 114 times speedup compared to the Intel CPU(2)Proposes a GPU-based high-speed matrix multiplication to accelerate the convolutional layer,the most time-consuming part of the YOLO algorithm.This method use matrix partitioning,coalesced access and computing multiple output elements per thread and other optimization strategies.Under the condition of NVIDIA GPU GTX1080 this method achieves approximately 4 times speedup compared to the original matrix multiplication of YOLO.(3)According to the characteristics that OpenCL language is easy to transplant in different types of devices,we modify and test the algorithm on Many Integrated Core(MIC)and Field-Programmable Gate Array(FPGA)respectively.For FPGA architecture characteristics and its resource utilization,the matrix multiplication in the convolutional layer is modified to achieve the highest performance with the highest resource utilization.Under the condition of Intel FPGA DE5-Net,The test process of a single image takes about1.9 seconds.

Keywords/Search Tags:

Convolutional Neural Network, Object Detection, OpenCL, YOLO, FPGA

PDF Full Text Request

Related items

1	Design And Implementation Of Object Detection Algorithm Based On YOLO
2	Research On Real-Time Detection Of Video Object Based On Improved YOLO Model
3	Research On Acceleration Of Convolutional Neural Networks On FPGA Based On OpenCL
4	Design Of YOLOv3-Tiny Algorithm Based On FPGA
5	Research On Light-weight Convolutional Neural Object Detection Network
6	Research And Design Of YOLO V2 Neural Network Accelerator Based On FPGA
7	Design And Implementation Of Object Detection Algorithm Based On YOLO
8	Research On Parallel Acceleration Architecture Convolutional Neural Network Based On FPGA
9	Design And Implementation Of A Target Detection Algorithm For Aerial Images Based On Deep Learning
10	Research On Real-time Object Detection Based On YOLOv2