Research On Parallelization Of Deep Learning Algorithms Based On GPU

Posted on:2018-03-15

Degree:Master

Type:Thesis

Country:China

Candidate:Y R Jin

Full Text:PDF

GTID:2348330542468914

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Because of its outstanding performance in image recognition,speech recognition,natural language processing and other fields,deep learning has become a research hotspot in both academic and industrial fields.The neural network usually contains a large number of parameters to be trained,so it is time-consuming to gain a neural network with good performance.Besides,in order to learn more valuable features from the massive data,the depth of the neural network is being deepened,which further increases the time of network training.How to improve the training speed and shorten the training cycle has become an important research direction in the field of deep learning.In recent years,the general-purpose computing technology of graphics processor unit has been developed rapidly,and the floating-point computing capability of the mainstream GPU is 10 times of the mainstream CPU.GPU has become the main accelerator in the field of high performance computing because of its powerful parallel computing power and high throughput.Based on the above analysis,this paper adopts the ideas of unrolling the convolution into matrix operation after researching the existing parallel acceleration algorithms,and parallels and accelerates the deep algorithm based on the CUDA computing framework to further improve the parallel efficiency of GPU.The main work of this paper is as follows:1)The basic idea and network structure of the neural network are analyzed,and the back propagation algorithm of traditional artificial neural network is studied in detail.We focus on the characteristics of sparse connection and sharing weights,and deduce the processing of convolution,pooling and gradient calculation to provide theoretical guidance to the parallelization of the deep neural networks.Besides,we also study deeply in the hardware of GPU and the thread structure,and programming model of CUDA.2)Based on CUDA platform,we design and implement the processing of forward computing,back propagation and updating parameters of layers in convolution neural networks on GPU by adopting the ideas of unrolling the convolution function to matrix operation and the ReLu activation function.Then we describe the constructing procedure and parameter initialization methods of the neural network,finally describes the training process of neural networks in detail.3)The implemented hidden layer is used to construct the LeNet-5,CIFAR-10 and AlexNet neural networks with three different sizes.Then,the neural networks are trained on CPU and GPU based on MNIST,CIFAR-10 and ImageNet data set respectively.The accuracy of the three neural networks on the GPU did not decrease and their acceleration ratios are 8.1,33.5 and 48.9 respectively.Compared with the current framework,the parallel acceleration method proposed in this paper has certain advantages.

Keywords/Search Tags:

Deep Learning, GPGPU, CUDA, Unrolling Convolution, ReLu

PDF Full Text Request

Related items

1	High Performance Implementation Of Multiple Machine Learning Algorithm On GPGPU
2	Research And Implementation Of Transplant CUDA Program Based On Android
3	Research And Implementation On Compiler Framework For Translating Ansic C Into CUDA C
4	CUDA-CHiLL: A programming language interface for GPGPU optimizations and code generation
5	Broadband And Narrowband Radar Signal Processing Based On GPGPU
6	Research On Driving Behavior Recognition Algorithm Based On Deep Learning
7	Implementation Of Radar Signal Processing Algorithms Based On GPGPU
8	Research On Printed Mathematical Formula Symbols Recognition Method Based On Deep Learning
9	Accelerating Typical SVM Algorithms Through CUDA Platform
10	Deep Convolution Neural Network And Its Application In Ground Image Target Recognition