Optimization Of Resource Allocation Algorithm For Training Complex Neural Networks In Model Parallel Mode

Posted on:2020-10-27

Degree:Master

Type:Thesis

Country:China

Candidate:J N Liu

Full Text:PDF

GTID:2428330572488161

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

In recent years,researches in the field of deep learning have developed rapidly.More complex network structures and deeper network layers make neural networks larger and larger.Although the accuracy of the neural network prediction is effectively improved,the amount of computation is also increased.At the same time,requirements for computational devices is also increased.To address those requirements,a common approach is to train those models on heterogeneous system with a mixture of hardware devices such as CPUs and GPUs.However,as the network scale continues to expand,a single GPU can no longer meet the training needs,and using a single-node with multi-card or even multi-node to train neural networks become an effective solution.So it is important to divide computational graphs onto multiple devices to efficiently train those neural networks.Based on the CPU and GPU platform,this paper studies how to improve the speed of training complex neural networks with model parallelism.We optimized the training process by using deep reinforcement learning to accelerate the whole training process of neural networks,and we also accelerate the training process on multiple nodes.The main contents and achievements of this paper include:(1)Analyzing of the architecture of the TensorFlow framework,analyzing the training process of the target neural network on heterogeneous system composed of CPU and GPUs by the code debugging,understanding the organization of the calcula-tions,form a technical path that optimizing the resource allocation method for training complex neural network models with model parallel mode.(2)Aiming at optimizing the resource allocation problem of training complex neu-ral network models in single-machine multi-card environment in model parallel mode,this paper studies a resource allocation optimization algorithm based on reinforcement learning,aiming at obtaining a higher training acceleration ratio.The basic idea of the algorithm is to generate a device placement p by using a neural network model?(p)for the computational graph G of the target network which is to be trained.Then,according to the device placement,the graph G is executed on the hardware system.And the cur-rent execution time ti is regarded as a reward to update the parameters of the predictive neural network model ?(p)by using the backpropagation algorithm.After several train-ings,?(p)can generate a device placement that minimizes the target network model runtime time tmin,and finally a resource optimization allocation scheme for the in-tended training target network is obtained.Experiments show that the prediction-based resource allocation algorithm achieves 28.41%performance improvements compared to the artificial-based resource allocation algorithm.(3)Aiming at the resource allocation problem of training complex neural network in model parallel mode on multiple nodes with multiple GPUs,this paper studies a method based on virtualization technology,aiming at achieving better scalability of training neural networks in model parallel mode.The method can train neural net-works in model parallel mode in the multi-node environment by using GPU virtualiza-tion technology.At the same time,we use the resource allocation algorithm based on reinforcement learning to optimize the training process,and experimentally analyze the scalability of the method.

Keywords/Search Tags:

Operations placement, Deep reinforcemetn learning, Model parallelism, TensorFlow, Heterogeneous computing

PDF Full Text Request

Related items

1	Research On Efficient Distributed Parallel Algorithm Of Deep Learning Framework Tensorflow
2	Optimizing Scheduling Of Data Parallelization On Deep Learning Framework Tensorflow
3	Application Of Convolutional Neural Network In Image Classification Under Tensorflow Framework
4	Research On Deep Learning Technology Of Target Detection For Heterogeneous Computing
5	Research On Resource Scheduling Of Deep Learning Tasks In TensorFlow Platform
6	TensorFlow Architecture Analysis And Application Research
7	Implementation Of Recognition System Based On Tensorflow Deep Learning And Research On Optimization Of Mobile Terminal Recognition
8	Optimization Of Distributed Training Strategies For Deep Learning Networks
9	Research Of Fined-grained Layer-wise Parallelism Strategy For Deep Learning Model On Many-core Platform
10	Research On Resource Demand Prediction And Placement Optimization In Cloud Computing