A Speed Optimization Of Cuda-convnet Deep Convolutional Neural Network Algorithm

Posted on:2016-06-06

Degree:Master

Type:Thesis

Country:China

Candidate:D X Li

Full Text:PDF

GTID:2308330503450624

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Deep convolutional neural network is a research focus that artificial intelligence is applied to image recognition. The goal of this network is to extract the complex characteristics of images and to give the classes that images belong to by multilayer network structure. When processing a batch of images, cuda-convnet deep convolutional neural network algorithm is faster. But in practice, an image needs to be processed and the original algorithm is not enough efficient. So based on deep understanding of cuda-convnet algorithm, the paper devises a new algorithm and optimizes the speed of original algorithm.The paper firstly introduces parallelization between images and points out its problem. Then the parallelization between feature maps and pixels is adopted. In this algorithm, the images are divided into several moduleswith the same size of filters. In the direction of parallelization between feature maps and pixels, each thread processes several pixels of one module in the feature map. So adding the result of severalthreadsis the result that filters convolves one module of an image. In the direction of filters parallelization, it will complete the task that all filters convolve one module. Then it will traverse all modules. The convolution result is that images have the same number of feature maps as filters. Finally, the convolution time of each convolution layer and the time of whole image are tested through the experiment. Between them, the time of one image that our algorithm presents is 2.2 milliseconds. It is six times faster than cuda-convnetalgorithm and has certain application value.

Keywords/Search Tags:

Deep Convolutional Neural Network, CUDA Multithreading, Parallelization

PDF Full Text Request

Related items

1	Research And Application Of Parallelized Neural Network
2	Multi-view 3D Reconstruction Algorithm And Parallelization Research Based On Deep Learning
3	Research On Parallelization Of Convolutional Neural Networks
4	The Research And Implementation Of Parallelization Method For Identification And Localization Of Sound Source
5	Deep Neural Network Model Parallelization And Optimization Based On GPU
6	Research On Deep Convolutional Neural Network Based On Sample Distribution
7	Design And Analysis Of Convolutional Neural Network Computing System Based On Asynchronous Method
8	Research On Parallelization Of Deep Convolutional Neural Network Based On Software Pipeline Technology
9	Design And Implementation Of Big Data Analysis And Forecast System Based On Deep Learning
10	Research On Program Parallelization Method Based On Deep Learning