Optimizing Scheduling Of Data Parallelization On Deep Learning Framework Tensorflow

Posted on:2020-12-21

Degree:Master

Type:Thesis

Country:China

Candidate:W Q Huang

Full Text:PDF

GTID:2428330596476775

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of science and technology,artificial intelligence has been applied more widely in practical engineering,almost including medicine,diagnosis,robot control,financial field,law,scientific discovery,toys and other fields relating to human beings,which shows its importance.Whatever field,it will be related to the data set and model training of relevant fields.In order to improve the efficiency of model training under massive data,the distributed model trainning appears,including model parallel and data parallel,but there are still several problems: firstly,in data parallel,information transmission of synchronous update is overhead instead of increasing the speed of each iteration.What's more,because of the massive communication,each iteration time may be longer than the single machine model and short board effect is obvious with lower resource utilization.Secondly,gradient expiration occurs in asynchronous update.When some workers in the system still use the previous gradient calculation,and the gradient version has been continuously updated in the parameter server,gradient expiration occurs,which makes the gradient descent process unstable.For the previous problems,no reliable algorithm has been proposed and implemented.Firstly,this thesis innovatively proposed the model structure of annular updating with gradient selection to solve the above problems.In the traditional solution,gradient selection is used to reduce the amount of parameters for communication,while circular update is used to eliminate the bottleneck of parameter server in the system.The two methods are innovatively combined.The gradient selection algorithm is used to reduce the exchanged parameters,and then the ring algorithm is used to accelerate the exchange and averaging of parameters,so that the training time of the model is greatly reduced.Secondly,as to asynchronous update of data parallel model,this thesis proposes an improved algorithm based on expiration threshold,which can effectively reduce the occurrence of process instability during gradient descent.When a worker requests parameter update once,it is first compared with the current parameter version in the server.If the difference is greater than the threshold,the update will be abandoned;otherwise,it will be updated as a new version.Considering the training time and accuracy of face recognition model to verify the innovative theory and method proposed in this paper,the experimental results of the above methods are compared with the original method,which proves that the algorithm in this paper can effectively solve the problem of data parallelism,reduce the training time 19.7% of data parallelism model,improve the GPU utilization 22.9% and enhance the stability of the system.

Keywords/Search Tags:

deep learning, TensorFlow, data parallelism, model training, circular update

PDF Full Text Request

Related items

1	Research On Efficient Distributed Parallel Algorithm Of Deep Learning Framework Tensorflow
2	Optimization Of Resource Allocation Algorithm For Training Complex Neural Networks In Model Parallel Mode
3	Runtime Optimization For Large-Scale Neural-Network Data-Parallelism Training
4	Optimization Of Distributed Training Strategies For Deep Learning Networks
5	Model Training Performance Analysis Of Typical Deep Learning Frameworks In The Single GPU Environment
6	A Study Of Efficient Training Approaches To Deep Learning Models
7	Parallel And Distributed Training Of Deep Learning
8	Research On Deep Learning Syntax Extension And Compilation Method Of COStream Language
9	Design And Implementation Of Tensorflow Distributed Model Training Platform On Kubernetes
10	Image Classification Method Based On Deep Learning And Accelerated Training Technique