Font Size: a A A

A Speed Optimization Of Cuda-convnet Deep Convolutional Neural Network Algorithm

Posted on:2016-06-06Degree:MasterType:Thesis
Country:ChinaCandidate:D X LiFull Text:PDF
GTID:2308330503450624Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Deep convolutional neural network is a research focus that artificial intelligence is applied to image recognition. The goal of this network is to extract the complex characteristics of images and to give the classes that images belong to by multilayer network structure. When processing a batch of images, cuda-convnet deep convolutional neural network algorithm is faster. But in practice, an image needs to be processed and the original algorithm is not enough efficient. So based on deep understanding of cuda-convnet algorithm, the paper devises a new algorithm and optimizes the speed of original algorithm.The paper firstly introduces parallelization between images and points out its problem. Then the parallelization between feature maps and pixels is adopted. In this algorithm, the images are divided into several moduleswith the same size of filters. In the direction of parallelization between feature maps and pixels, each thread processes several pixels of one module in the feature map. So adding the result of severalthreadsis the result that filters convolves one module of an image. In the direction of filters parallelization, it will complete the task that all filters convolve one module. Then it will traverse all modules. The convolution result is that images have the same number of feature maps as filters. Finally, the convolution time of each convolution layer and the time of whole image are tested through the experiment. Between them, the time of one image that our algorithm presents is 2.2 milliseconds. It is six times faster than cuda-convnetalgorithm and has certain application value.
Keywords/Search Tags:Deep Convolutional Neural Network, CUDA Multithreading, Parallelization
PDF Full Text Request
Related items