Font Size: a A A

Research On Layer-by-layer Adaptive Communication Optimization Method For Distributed Deep Learning

Posted on:2022-04-22Degree:MasterType:Thesis
Country:ChinaCandidate:M Y ZhuFull Text:PDF
GTID:2568307034474074Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As the scale of deep learning continues to increase,traditional single-machine deep learning can not meet the actual needs.In order to improve training efficiency,distributed deep learning has become a necessary means.However,due to the large number of gradient parameter data exchanges between the computing nodes,communication has become a major bottleneck in improving the efficiency of distributed deep learning.To solve this problem,a layer-by-layer adaptive communication optimization algorithm LACO is proposed,which is mainly composed of LADS method and 8-bit quantization method.Based on the discrete characteristics of the transmission gradients,the LADS algorithm determines the value of the sparse parameter α through pre-training,and obtains the L2 norm of the gradient vector layer by layer to determine the sparsity threshold of each layer in the network.The 8-bit quantization method performs logarithmic encoding on single-precision floating-point numbers,and compresses 32-bit floating-point numbers to 8-bit lossly while ensuring a certain accuracy.Based on the LADS algorithm and the 8-bit quantization algorithm,by using three training optimization methods: residual gradient,hot start,and hierarchical sparsity rate adjustment,the performance of distributed deep learning in convergence and accuracy is further improved.The above algorithm is implemented based on the Horovod architecture using the Ring Allreduce communication method.Use a variety of neural networks to experiment on five data sets: MNIST,CIFAR10,CIFAR100,PTB and HPRC.Compared with the Sign SGD and Gradient Dropping algorithms,while achieving similar data compression rates,the training accuracy increases by up to 1.93%.At the same time,compared with the baseline that does not use any communication optimization algorithm,the convergence is almost the same.In terms of training efficiency,the LACO algorithm can achieve a speedup of up to 2.54 times compared with the baseline algorithm.
Keywords/Search Tags:Distributed deep learning, Gradient quantization, Gradient sparsity, Ring Allreduce
PDF Full Text Request
Related items