Font Size: a A A

Optimization Of Distributed Training Strategies For Deep Learning Networks

Posted on:2022-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:Z C ZhangFull Text:PDF
GTID:2518306536988319Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Machine learning tends to big data,large models,and large-scale cluster-training nowadays with the rapid development of artificial intelligence.This not only strengthens the capability of ma-chine learning,but also puts forward higher technical requirements for machine learning strategies.Distributed machine learning is being studied by more and more scholars and put into industrial applications,focusing on data division,task allocation,resource scheduling and communication optimization for the total balance between training speed and convergence accuracy.In order to accelerate the speed of machine learning and improve computing and communication efficiency,this article focuses on the distributed parallel strategy of deep learning network and the communi-cation optimization scheme.The main contents are as follows:Firstly,this paper designs NetPlacer:a distributed optimization tool based on equipment bal-ance to search for the best distributed training strategy.In distributed model parallelism,different parts of the network model are allocated to different devices,result in different distribution of cal-culation and communication of the network,leading to different training speed.So it is necessary to select the best distributed strategy to minimize the training time.The simulation time scheme is used most commonly to search for the optimal solution at present,but there is a problem of simulation error.In order to avoid this problem,this paper designs a distributed optimization tool based on device balance,which searches for the optimal distributed solution with the goal of bal-ance between memory and computing across different devices.This tool avoids errors caused by time simulation,improves the parallelism of devices and reduces cross-device communication.It significantly speed up the distributed machine learning training process.Secondly,this paper designs GradOptimizer:an optimization algorithm for gradient grouping to optimize gradient communication in distributed training.In distributed data parallelism,the gradients on different devices need to undergo allreduce communication,so that each device can obtain complete gradient data.Deep learning networks have a large number of layers.This brings additional communication overhead because the gradient data of each layer in the network needs to be allreduced.In order to avoid this problem,this paper designs an optimization algorithm for gradient grouping communication,which uses grouping method for gradient communication,and finds the optimal grouping scheme by simulating the calculation and communication process of the gradient data.Using this algorithm can greatly improve the parallelism of gradient calculation and communication,thereby speeding up the distributed machine learning training process.
Keywords/Search Tags:Distributed machine learning, model parallelism, equipment balancing, data parallelism, gradient grouping
PDF Full Text Request
Related items