Font Size: a A A

Research On Parallel Accelerate Training Algorithm Of Multilayer Neural Network Based On Multi-GPU

Posted on:2016-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:Z J BiFull Text:PDF
GTID:2348330503486896Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
At the beginning of the research on neural networks, it mainly used CPU or its cluster to train the model due to the limitation of hardware development. As for the intensive computation of multilayer neural networks in deep learning, the training process of networks in traditional CPU environment needs higher time costs, so the computing resources of hardware are becoming the bottleneck, and spending too much time has been a key problem of affecting the efficiency of system when training a neural network.Nowadays GPU is the general-purpose computing device consisting of massive computing units, and training the neural network on it becomes a trend. Unlike the inefficient serial training process in the CPU environment, the training on GPU needs to fully utilize the hardware computing resources and mine the parallelism of the neural network model. The main research and work of this thesis is about how to dispatch the training data among multi-GPUs and how to deliver parameters and gradients among the parameter server and GPU clients.This thesis is committed to the research on parallel accelerate training algorithm of multilayer neural network. A thorough research on the process of the supervised learning for multilayer neural network has been made. According to the parallel structure of the network and the way of learning, it has been designed to train mini-batch data in the parallel way of feedforward propagation and error back-propagation with a single GPU based on CUDA platform, which implemented the model parallelism and data parallelism. A detailed research and analysis have been done to search for the critical resources which affecting the speed of training neural networks with the traditional asynchronous stochastic gradient descent algorithm, then the thesis put forward three aspects to improve the performance of it. The first improvement is to add a duplication of model parameters in the parameter server. The second improvement is to design the mechanism of dispatching the mini-batch data. The third improvement is to set an independent thread to dispatch gradients between GPU and parameter server. Finally the system is implemented with the improved algorithm based on multi-GPU, which implemented data parallelism on multiple model duplications. During the period of experiments, the benchmark is the measured time in training period. Experiments and analysis have been made to achieve the same network training on single GPU and CPU. The comparative experiments with the popular deep learning framework DMLC reflected that the accelerate performance of our system is remarkable. Also a number of experiments and analysis have been done with different size of mini-batch data and different number of GPUs. The effectiveness of the improved asynchronous stochastic gradient descent algorithm is validated through the experiments compared to the traditional algorithm.
Keywords/Search Tags:multi-gpu, parallel acceleration, multilayer neural network, back-propagation algorithm, asynchronous stochastic gradient decent algorithm
PDF Full Text Request
Related items