Research On Layer-by-layer Adaptive Communication Optimization Method For Distributed Deep Learning

Posted on:2022-04-22

Degree:Master

Type:Thesis

Country:China

Candidate:M Y Zhu

Full Text:PDF

GTID:2568307034474074

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

As the scale of deep learning continues to increase,traditional single-machine deep learning can not meet the actual needs.In order to improve training efficiency,distributed deep learning has become a necessary means.However,due to the large number of gradient parameter data exchanges between the computing nodes,communication has become a major bottleneck in improving the efficiency of distributed deep learning.To solve this problem,a layer-by-layer adaptive communication optimization algorithm LACO is proposed,which is mainly composed of LADS method and 8-bit quantization method.Based on the discrete characteristics of the transmission gradients,the LADS algorithm determines the value of the sparse parameter α through pre-training,and obtains the L2 norm of the gradient vector layer by layer to determine the sparsity threshold of each layer in the network.The 8-bit quantization method performs logarithmic encoding on single-precision floating-point numbers,and compresses 32-bit floating-point numbers to 8-bit lossly while ensuring a certain accuracy.Based on the LADS algorithm and the 8-bit quantization algorithm,by using three training optimization methods: residual gradient,hot start,and hierarchical sparsity rate adjustment,the performance of distributed deep learning in convergence and accuracy is further improved.The above algorithm is implemented based on the Horovod architecture using the Ring Allreduce communication method.Use a variety of neural networks to experiment on five data sets: MNIST,CIFAR10,CIFAR100,PTB and HPRC.Compared with the Sign SGD and Gradient Dropping algorithms,while achieving similar data compression rates,the training accuracy increases by up to 1.93%.At the same time,compared with the baseline that does not use any communication optimization algorithm,the convergence is almost the same.In terms of training efficiency,the LACO algorithm can achieve a speedup of up to 2.54 times compared with the baseline algorithm.

Keywords/Search Tags:

Distributed deep learning, Gradient quantization, Gradient sparsity, Ring Allreduce

PDF Full Text Request

Related items

1	Improvement Of Gradient Sparsification In Distributed Deep Learning
2	A Study On Key Technologies For Gradient-based Data Reconstruction In Federated Learning
3	Stochastic Gradient Method Based On First-Order Gradient Information In Deep Learnin
4	Research On Distributed Stochastic Gradient Descent Algorithm
5	Applied Research On Gradient Descent Algorithm In Deep Learning
6	Research On Communication Optimization For Distributed Deep Learning
7	A Research Of Stochastic Gradient Descent Algorithm
8	System Support For Low-Rank Decomposition Gradient Compression Algorithms In Deep Learning Data Parallel Training
9	Reordering Of Question Answering System Based On Deep Learning And Gradient Lifting Tree Algorithm
10	Research On Three-step Accelerated Gradient Algorithm In Deep Learning