Research On Parallel Algorithm Of Convolutional Neural Network Based On GPU

Posted on:2019-03-14

Degree:Master

Type:Thesis

Country:China

Candidate:H Z Cui

Full Text:PDF

GTID:2428330548995000

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

At present,convolutional neural network is one of the focuses of deep learning research.With its excellent recognition performance,it has received more and more attention.Because the convolutional neural network training process contains a large number of training parameters,tens of millions of calculations,etc.,it takes a lot of time to complete a usable convolutional neural network.Usually,GPU acceleration is used to complete the training.However,due to the complex GPU microarchitecture,there are many parameters in the training process of the convolutional neural network and the exchange speed with the computer main memory is limited.The programming process cannot fully combine the characteristics of the GPU and other factors,leading to convolutional nerves.The network training process cannot give full play to the computing performance of the GPU.Aiming at the problem that the convolution training process can not fully utilize GPU computing performance,this thesis optimizes the parallel algorithm of convolution neural network based on Kepler K40 architecture GPU.The main research work of this thesis includes the following two aspects:This thesis presents an improved convolution matrix multiplication algorithm.Firstly,the convolution neural network training process is studied,and the convolution calculation formula is deduced.The object of optimization is convolution matrix multiplication.Secondly,it analyzes the features of GPU microarchitecture,and gives some performance indexes.The key indexes that affect the performance of GPU are the shared memory,register cache and so on.Then,the task partition of convolution matrix multiplication is analyzed,and the algorithm is implemented by CUDA programming framework.Finally,the validity and correctness of the algorithm are verified by experiments.Bade on the improve convolution matrix multiplication algorithm,a cyclic expansion method is proposed.By analyzing the influence factors of the loop unrolling conditions and the number of loop unfoldings,the loop unrolling condition guarantees that the program will be launched under better conditions;the influence factor affects the size of the loop unfolding times.In order to solve the problem of the effective number of loop expansions,this thesis designs and implements the loop unfolding process,tests and finds the optimal expansion times.Finally,the validity and correctness of the process are verified by experiments.Experiments show the convolutional matrix multiplication algorithm based on the improved performance index is effective and its computational performance reaches 2115 GFLOPS.Based on this algorithm and the loop unrolling method,GPU computing performance reaches 2238 GFLOPS,compared with the pre-improvement computing performance,after the convolutional matrix multiplication of the computational performance increased by times.The multiplication of the optimized convolution matrix is applied to the convolutional neural network training process.The average speedup of the realization performance of this thesis is 1.91 relative to the Caff library,and the average speedup is 0.98 relative to the cuda-convnet.Therefore,the optimization of this thesis has a greater practical value.

Keywords/Search Tags:

GPU Microarchitecture, Convolution Neural Network, Convolution Matrix Mutiplication, Loop Expansion

PDF Full Text Request

Related items

1	Research And Application Of The Convolution Neural Network
2	Research On Target Detection Based On Improved Convolutional Neural Network
3	The Research Of Image Classification Methods Based On Convolution Neural Network
4	Cloud Classification Of Large Scale Dataset Based On Convolution Neural Network
5	Non-iterative Of Compressed Sensing Algorithm Based On Convolutional Neural Network
6	Research On Visual Place Recognition Technology Based On Convolution Neural Network Feature
7	Research Of Vehicle Logo Recognition Technology Based On Convolution Neural Network And Processing Strategy On Few Samples
8	Research And Application Of Image Classification Algorithms Based On Convolution Neural Network
9	Study On Keyword Recognition Based On Neural Network
10	Research On Speech Recognition Based On Convolution Neural Network