Font Size: a A A

Object Detection Algorithm Acceleration Based On OpenCL

Posted on:2019-02-18Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhangFull Text:PDF
GTID:2428330572952209Subject:Engineering
Abstract/Summary:PDF Full Text Request
Object detection has a wide range of applications in various fields of life and is an important direction of computer vision.It aims to determine whether there are single or multiple objects of interest in an image or video,and to determine the category and location of the object.The accuracy of the detection of the traditional object detection algorithm is limited by the feature extraction and the positioning error.The object detection algorithm based on the deep learning framework integrates the feature extraction with the object recognition and localization.So the object detection algorithm based on deep learning framework gradually occupies the market.We use OpenCL language to speed up the YOLO detection algorithm which is based on deep learning framework.The speed of train and test is significantly improved without the reduction of accuracy.The main work is as follows:(1)Use OpenCL language to accelerate the convolutional layer,the max pooling layer,the average pooling layer,batch normalization layer,Softmax layer,loss function and other layers in convolutional neural network.Under the condition of NVIDIA's graphics card GTX1080,the train process of YOLO achieves about 141 times speedup compared to the Intel CPU,and the test process achieves about 114 times speedup compared to the Intel CPU(2)Proposes a GPU-based high-speed matrix multiplication to accelerate the convolutional layer,the most time-consuming part of the YOLO algorithm.This method use matrix partitioning,coalesced access and computing multiple output elements per thread and other optimization strategies.Under the condition of NVIDIA GPU GTX1080 this method achieves approximately 4 times speedup compared to the original matrix multiplication of YOLO.(3)According to the characteristics that OpenCL language is easy to transplant in different types of devices,we modify and test the algorithm on Many Integrated Core(MIC)and Field-Programmable Gate Array(FPGA)respectively.For FPGA architecture characteristics and its resource utilization,the matrix multiplication in the convolutional layer is modified to achieve the highest performance with the highest resource utilization.Under the condition of Intel FPGA DE5-Net,The test process of a single image takes about1.9 seconds.
Keywords/Search Tags:Convolutional Neural Network, Object Detection, OpenCL, YOLO, FPGA
PDF Full Text Request
Related items