Font Size: a A A

Research On Parallelization Of Deep Learning Algorithms Based On GPU

Posted on:2018-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y R JinFull Text:PDF
GTID:2348330542468914Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Because of its outstanding performance in image recognition,speech recognition,natural language processing and other fields,deep learning has become a research hotspot in both academic and industrial fields.The neural network usually contains a large number of parameters to be trained,so it is time-consuming to gain a neural network with good performance.Besides,in order to learn more valuable features from the massive data,the depth of the neural network is being deepened,which further increases the time of network training.How to improve the training speed and shorten the training cycle has become an important research direction in the field of deep learning.In recent years,the general-purpose computing technology of graphics processor unit has been developed rapidly,and the floating-point computing capability of the mainstream GPU is 10 times of the mainstream CPU.GPU has become the main accelerator in the field of high performance computing because of its powerful parallel computing power and high throughput.Based on the above analysis,this paper adopts the ideas of unrolling the convolution into matrix operation after researching the existing parallel acceleration algorithms,and parallels and accelerates the deep algorithm based on the CUDA computing framework to further improve the parallel efficiency of GPU.The main work of this paper is as follows:1)The basic idea and network structure of the neural network are analyzed,and the back propagation algorithm of traditional artificial neural network is studied in detail.We focus on the characteristics of sparse connection and sharing weights,and deduce the processing of convolution,pooling and gradient calculation to provide theoretical guidance to the parallelization of the deep neural networks.Besides,we also study deeply in the hardware of GPU and the thread structure,and programming model of CUDA.2)Based on CUDA platform,we design and implement the processing of forward computing,back propagation and updating parameters of layers in convolution neural networks on GPU by adopting the ideas of unrolling the convolution function to matrix operation and the ReLu activation function.Then we describe the constructing procedure and parameter initialization methods of the neural network,finally describes the training process of neural networks in detail.3)The implemented hidden layer is used to construct the LeNet-5,CIFAR-10 and AlexNet neural networks with three different sizes.Then,the neural networks are trained on CPU and GPU based on MNIST,CIFAR-10 and ImageNet data set respectively.The accuracy of the three neural networks on the GPU did not decrease and their acceleration ratios are 8.1,33.5 and 48.9 respectively.Compared with the current framework,the parallel acceleration method proposed in this paper has certain advantages.
Keywords/Search Tags:Deep Learning, GPGPU, CUDA, Unrolling Convolution, ReLu
PDF Full Text Request
Related items