Font Size: a A A

Accelerating Convolution Based Detection Model On GPU

Posted on:2016-11-08Degree:MasterType:Thesis
Country:ChinaCandidate:Q LiuFull Text:PDF
GTID:2308330476453267Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
In recent years, Convolution-based detection models(CDM) have achieved tremendous success in object detection, such as Deformable part-based models(DPM) and Convolutional neural networks(CNN). The simplicity of these models allows for very large scale training to achieve higher robustness and recognition performance. The main bottleneck of those powerful state-of-the-art models is the unacceptable computational cost of the convolution in model training and evaluation, which has become a major limitation in many practical applications. Fortunately, with the GPU general parallel computing technology becoming more mature, it provides a practicable scheme for accelerating convolution-based detection models.In this paper, after studying the convolution-based detection models and analyzing their performance bottlenecks, we accelerated the convolution-based detection models with the mathematic and parallel techniques, without reducing model complexity and losing detection accuracy.On one hand, we leveraged the classical convolution theory to convert the convolution operation in the spatial space to the dot product operation in the frequency domain to decrease computation difficulties, and employed the heuristic bin-packing algorithm to balance the contradiction between the computation overhead and storage cost. Experimental results on the public dataset PASCAL VOC demonstrated that the frequency accelerating algorithm could efficiently speed up convolution-based detection models with the same accuracy.On the other hand, after detailed parallelism analysis of frequency accelerating algorithm, we presented our GPU parallel implementation using OpenCL. We applied many optimization methods to improve performance, such as memory access optimization, data transmission optimization and control flow optimization. Experimental results showed that our OpenCL implementation could achieve significant speedups compared with the traditional CPU version, and a certain improving even compared with perfect optimized CPU implementation.In addition, we also studied the accelerating implementation and optimization of constructing HOG feature pyramid on GPU. Experimental results demonstrated that our GPU implementation could achieve a certain speedups compared with CPU version.
Keywords/Search Tags:Object detection, Deformable part-based model, Convolutional neural network, GPU general parallel computing, OpenCL, HOG
PDF Full Text Request
Related items