Font Size: a A A

Parallel Algorithm Of Convolutional Neural Network In Multi-GPU Environment

Posted on:2017-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y M WangFull Text:PDF
GTID:2308330485453789Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of deep learning technology, convolutional neural network attracts more and more attention in the field of image recognition for its excellent performance. Because the neural network usually contains massive parameters, training a usable convolutional neural network is very time consuming. So, how to accelerate the training process is an important research topic in the field of deep learning. In general, multi-GPU parallelism is used to solve this problem. The main multi-GPU acceleration strategies are model parallel and data parallel. The algorithm based on model parallel is hard to reach load balance and has poor acceleration. The current parallel structures in algorithms based on data parallel all have a problem: update tasks cannot be evenly distributed, so these algorithms cannot make full use of the computation resource.To solve those problems, with the research of current multi-GPU parallel algorithms, a circle structure based on data parallel is proposed in this paper, which draws on the asynchronous stochastic gradient descent algorithm and uses delayed model update strategy. This scheme can improve the multi-GPU parallel efficiency. The main work in this paper is listed as follows:1) Analyzing the structure of hidden layers in convolutional neural network. The model is trained with stochastic gradient descent. Based on that, the calculation formulas of trainable parameters in convolutional layers, pooling layers and fully connected layers are derived. All of this is prepared for designing parallel algorithm and code implementation.2) Proposing a data parallel program based on circle structure. After comparing model parallel and data parallel, data parallel is further researched in order to get better scalability. To solve the computing load imbalance problem in existing data parallel programs, a circle structure is proposed to organize the GPU nodes. In the new program, every GPU has to train and update the model, so the computation tasks are evenly divided. In the end, the parallel performance of the new program is analyzed in theory.3) Implementing convolutional neural network on multi-GPU. According to the formulas of training convolutional neural network, single GPU version of code is implemented. And a proper scheme is chosen to initialize the model parameters. Using suitable synchronous strategy and organizing the data transformation between GPUs in circle structure, data parallel is implemented based on single GPU vision. In addition, each GPU will create two threads to achieve task parallel which are responsible for the calculation and transmission. In that case, it will use the calculation to overlap transmission.4) Experimenting and analyzing the parallel scheme based on circle structure. On MNIST and CIFAR-10 datasets, the parallel scheme based on circle structure is used to train convolutional neural network. With four GPUs, it achieves 3.77 and 3.79 times speedup respectively. In addition, the new scheme is compared with the results of training neural network with synchronous master-slave structure and reduction tree structure. The comparison shows that the new scheme has better parallel speedup.
Keywords/Search Tags:GPU, deep learning, convolutional neural network, data parallel, model parallel
PDF Full Text Request
Related items