Accelerating Convolution Based Detection Model On GPU

Posted on:2016-11-08

Degree:Master

Type:Thesis

Country:China

Candidate:Q Liu

Full Text:PDF

GTID:2308330476453267

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

In recent years, Convolution-based detection models(CDM) have achieved tremendous success in object detection, such as Deformable part-based models(DPM) and Convolutional neural networks(CNN). The simplicity of these models allows for very large scale training to achieve higher robustness and recognition performance. The main bottleneck of those powerful state-of-the-art models is the unacceptable computational cost of the convolution in model training and evaluation, which has become a major limitation in many practical applications. Fortunately, with the GPU general parallel computing technology becoming more mature, it provides a practicable scheme for accelerating convolution-based detection models.In this paper, after studying the convolution-based detection models and analyzing their performance bottlenecks, we accelerated the convolution-based detection models with the mathematic and parallel techniques, without reducing model complexity and losing detection accuracy.On one hand, we leveraged the classical convolution theory to convert the convolution operation in the spatial space to the dot product operation in the frequency domain to decrease computation difficulties, and employed the heuristic bin-packing algorithm to balance the contradiction between the computation overhead and storage cost. Experimental results on the public dataset PASCAL VOC demonstrated that the frequency accelerating algorithm could efficiently speed up convolution-based detection models with the same accuracy.On the other hand, after detailed parallelism analysis of frequency accelerating algorithm, we presented our GPU parallel implementation using OpenCL. We applied many optimization methods to improve performance, such as memory access optimization, data transmission optimization and control flow optimization. Experimental results showed that our OpenCL implementation could achieve significant speedups compared with the traditional CPU version, and a certain improving even compared with perfect optimized CPU implementation.In addition, we also studied the accelerating implementation and optimization of constructing HOG feature pyramid on GPU. Experimental results demonstrated that our GPU implementation could achieve a certain speedups compared with CPU version.

Keywords/Search Tags:

Object detection, Deformable part-based model, Convolutional neural network, GPU general parallel computing, OpenCL, HOG

PDF Full Text Request

Related items

1	Object Detection Based On Deformable Convolutional Neural Networks
2	Research On Human Detection And Action Recognition Based On Convolution Feature Deformable Part Model
3	Research On Object Detection Technology Based On Deformable Part Model
4	Research On Part-based Object Recognition And Its Applications
5	Object Detection Algorithm Acceleration Based On OpenCL
6	A Scene-specific Deformable Part-based Model For Object Detection
7	Research On Object Detection Algorithm Based On Convolution Neural Network
8	Research On Human Detection And Tracking Based On Deformable Part-based Model
9	An Improved Method Of Deformable Parts Based Model
10	Research On Parallel Acceleration Architecture Convolutional Neural Network Based On FPGA