Communication Optimization Technique For Distributed Synchronous Data Parallel Training

Posted on:2022-12-20

Degree:Master

Type:Thesis

Country:China

Candidate:N F Bi

Full Text:PDF

GTID:2518306776493524

Subject:Automation Technology

Abstract/Summary:

PDF Full Text Request

In recent years,deep learning techniques have been developing rapidly.In particular,driven by the large-scale datasets,distributed deep learning systems have been widely adopted in academia and industry.The distributed deep learning systems commonlyemploy synchronous data parallelism to train models.Synchronous stochastic gradient descent(SSGD)is the most widely used distributed synchronous data parallel training algorithm,which involves communication over network in each iteration.Nevertheless,the communication overhead is expensive in distributed environments with limited bandwidth.A straightforward way to reduce the communication overhead is to increase the communication interval,i.e.communicating only once every couple of iterations,instead of communicating in each iteration.However,increasing the communication interval usually affects the convergence rate of the model.This results in the training algorithm requiring more epochs to train the model to the target accuracy,i.e.,decreases the statistical efficiency of the training algorithm.In addition,the choice of communication interval directly determines the performance of the training algorithm.However,existing methods of choosing the communication interval introduce expensive additional overhead of collecting statistics or adjusting hyper-parameters.To address the problems that exist in the above distributed synchronous data parallel training algorithms and methods of choosing the communication interval,we focus on distributed synchronous data parallel training algorithms with both low communication overhead and high statistical efficiency,and methods of choosing the communication interval with low additional overhead.The main contributions of this thesis around the study are as follows.We propose a training algorithm combining the skipping strategy and the correction technique,which ensures both low communication overhead and high statistical efficiency.This training algorithm maintains a small batch size by local updates in each training process and reduces the divergence among local models by the correction technique,thus ensuring high statistical efficiency.Meanwhile,this training algorithm employs the skipping strategy to update the global model.Instead of up-dating the global model in each iteration,it updates the global model only once every several iterations.This reduces the communication frequency and thus ensures low communication overhead.We design an adaptive communication interval strategy based on the runtime statistics of the first iteration,which reduces the additional overhead of choosing the communication interval.This adaptive communication interval strategy initializes the communication interval to 1,and collects the time spent on communication and computation in the first iteration.Based on the collected statistics,it adjusts the communication interval to make the time spent on communication and the time spent on computation in each epoch are close.After the adjustment of the communication interval in the first iteration,the communication interval is applied to all subsequent iterations.It does not collect statistics and adjust the communication interval in subsequent iterations,which ensures low additional overhead.We implement a prototype system that employs the above-mentioned training algorithm combining the skipping strategy and the correction technique as well as the adaptive communication interval strategy.We implement the training algorithm combining the skipping strategy and the correction technique as well as the adaptive communication interval strategy in Tensor Flow,a distributed deep learning system.Based on this prototype system,we evaluate the efficiency of the training algorithm and the communication interval strategy.Moreover,we elaborate on the design of the prototype system.In summary,we propose the communication-optimized training algorithm and tuning strategy for the problem of high communication overhead in distributed synchronous data parallel training,and implement a prototype system.Experimental results demonstrate the efficiency of the above communication optimization techniques.In particular,compared with the SSGD training algorithm,our training algorithm combining the skipping strategy and the correction technique reduces the overall training time by 88.9%.Compared with the existing communication interval selection strategies,our adaptive communication in-terval strategy reduces the additional overhead by three orders of magnitude.

Keywords/Search Tags:

Deep Learning System, Synchronous Training, Data Parallelism, Distributed Training, Communication Optimization

PDF Full Text Request

Related items

1	Runtime Optimization For Large-Scale Neural-Network Data-Parallelism Training
2	Optimization Of Distributed Training Strategies For Deep Learning Networks
3	Image Classification Method Based On Deep Learning And Accelerated Training Technique
4	Research And Implementation Of Pipeline-based Distributed Deep Learning Training Optimization Technology In GPU Cluster Environment
5	Research On Key Technologies Of Memory Management And Communication Optimization For Deep Learning System
6	Optimal Design And Implementation Of Distributed Deep Learning Training
7	Parallel And Distributed Training Of Deep Learning
8	Research On Efficient Distributed Parallel Algorithm Of Deep Learning Framework Tensorflow
9	Optimizing Scheduling Of Data Parallelization On Deep Learning Framework Tensorflow
10	A Study Of Efficient Training Approaches To Deep Learning Models