Font Size: a A A

Performance Heterogeneity-Oriented Convolution Neural Network Parallel Optimization

Posted on:2018-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:J F XiaoFull Text:PDF
GTID:2348330515966792Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Convolution neural network is a special artificial neural network model.It is a new kind of deep neural network method which is designed and realized by combining the traditional artificial neural network and the deep learning network model,especially for recognizing two-dimensional images.The application of convolution neural network in large-scale computer vision is one of the hot issues.In recent years,with the increase of model training parameters and data size,the traditional stand-alone training method fails in the practical application because it can not complete the training within the acceptable time.Distributed convolution neural network training framework,such as Cuda-convnet and Deeplearning4j,have solved this problem.But the current training framework is often simply using data parallel or model parallel approach to accelerate the implementation of the framework of training,making the performance of heterogeneous computing resources under the architecture has not been fully utilized.In this paper,in order to speed up the convolutional neural network training and improve training framework performance as the goal,combined with the performance heterogeneous system and distributed system theory,from the perspective of the training method of convolutional neural network training,proposing a convolutional neural network parallel optimization strategy based on the performance heterogeneous system that called delay synchronous parallel using the parametric server.The strategy is based on the principle of fully utilizing the performance of the heterogeneous system,and improves the training speed of the network framework under the premise of ensuring the model to converge efficiently.The decoupling between the distributed training nodes reduces the dependence of the whole training process on the heterogeneous computing resources,and finally makes the convolution neural network framework better adapt to the heterogeneous system environment.The major contributions of this paper include the following points:(1)Aiming at the performance problem of large-scale convolution neural network training,a parallel computing model(multi-iterative parallel computing model)for large-scale convolution neural network is proposed by combining the parallel and large data processing technology and combining the performance heterogeneity of parallel system.Based on this model,the optimization methods of data layer,computing layer and communication layer are proposed.Then,according to the characteristics of convolution neural network,the optimization scheme of each level is put forward,and the way of constructing cost analysis model is put forward to guide the dynamic management of resources.(2)Based on the optimization scheme of multi-iterative parallel model,the parametric server is introduced.According to the synchronous parallelism and the asynchronous parallelism,a performance heterogeneous system-based parallel strategy called the delay synchronization is proposed.This strategy can accelerate the model training process while ensuring the model is convergence.At the same time,because the strategy can reduce the dependence on the hardware resources,it can better adapt to the heterogeneous system training environment and play a good optimization effect on the communication process.(3)The delay synchronization parallel strategy is implemented by using MPI and Pthread,combined with the master and slave structure model.The experiment on Handwritten Numeral Data Set(MNIST)shows that the delay synchronization strategy in performance heterogeneous environment reduces the dependency of the training process on computing resources.The training process conforms to the description of the performance model,and this strategy has better speedup and scalabilily than traditional data parallel algorithm.
Keywords/Search Tags:Convolution Neural Network, Performance Heterogeneous, Parallelization, Parallel Computing Model, Delay Synchronization
PDF Full Text Request
Related items