Optimization Of Distributed Training Strategies For Deep Learning Networks

Posted on:2022-03-13

Degree:Master

Type:Thesis

Country:China

Candidate:Z C Zhang

Full Text:PDF

GTID:2518306536988319

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Machine learning tends to big data,large models,and large-scale cluster-training nowadays with the rapid development of artificial intelligence.This not only strengthens the capability of ma-chine learning,but also puts forward higher technical requirements for machine learning strategies.Distributed machine learning is being studied by more and more scholars and put into industrial applications,focusing on data division,task allocation,resource scheduling and communication optimization for the total balance between training speed and convergence accuracy.In order to accelerate the speed of machine learning and improve computing and communication efficiency,this article focuses on the distributed parallel strategy of deep learning network and the communi-cation optimization scheme.The main contents are as follows:Firstly,this paper designs NetPlacer:a distributed optimization tool based on equipment bal-ance to search for the best distributed training strategy.In distributed model parallelism,different parts of the network model are allocated to different devices,result in different distribution of cal-culation and communication of the network,leading to different training speed.So it is necessary to select the best distributed strategy to minimize the training time.The simulation time scheme is used most commonly to search for the optimal solution at present,but there is a problem of simulation error.In order to avoid this problem,this paper designs a distributed optimization tool based on device balance,which searches for the optimal distributed solution with the goal of bal-ance between memory and computing across different devices.This tool avoids errors caused by time simulation,improves the parallelism of devices and reduces cross-device communication.It significantly speed up the distributed machine learning training process.Secondly,this paper designs GradOptimizer:an optimization algorithm for gradient grouping to optimize gradient communication in distributed training.In distributed data parallelism,the gradients on different devices need to undergo allreduce communication,so that each device can obtain complete gradient data.Deep learning networks have a large number of layers.This brings additional communication overhead because the gradient data of each layer in the network needs to be allreduced.In order to avoid this problem,this paper designs an optimization algorithm for gradient grouping communication,which uses grouping method for gradient communication,and finds the optimal grouping scheme by simulating the calculation and communication process of the gradient data.Using this algorithm can greatly improve the parallelism of gradient calculation and communication,thereby speeding up the distributed machine learning training process.

Keywords/Search Tags:

Distributed machine learning, model parallelism, equipment balancing, data parallelism, gradient grouping

PDF Full Text Request

Related items

1	Research On Efficient Distributed Parallel Algorithm Of Deep Learning Framework Tensorflow
2	Runtime Optimization For Large-Scale Neural-Network Data-Parallelism Training
3	Research On Parallel Application System For Triplet-based Architecture
4	The Research And Implementation Of Parallel Computing Method On MPCore Multicore Processor
5	On The Depth And Big Model Of Deep Neural Networks: Theory And Algorithm
6	Research On Deep Learning Syntax Extension And Compilation Method Of COStream Language
7	Research Of Memory-Access Management And Scheduling Optimization In Functional Parallelism
8	A unified framework for transparent parallelism and fault-tolerance in distributed systems
9	Parallel And Distributed Training Of Deep Learning
10	Parallelism Research For Multimedia Retrieval Algorithm