Font Size: a A A

Communication Optimization Of Distributed Deep Learning System Based On Model Structure Characteristics

Posted on:2021-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:J PengFull Text:PDF
GTID:2518306104488174Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
In distributed deep learning training,the synchronization of gradients and parameters usually brings huge network communication overhead.The existing communication optimization methods mainly include gradient compression through sparsification or quantization and some optimizations of communication modes.However,the existing gradient compression strategies generally use an identical method for all layers in a network or different networks,ignoring the characteristic difference between different layers in the network and the differences between disparate networks,which results in unsatisfactory optimization effect.Aiming at overcoming the shortcomings of the existing optimization methods,a hybrid communication optimization method named Hylo is proposed based on model structure characteristics.Two different strategies are used to perform gradient compression on two different types of layers,which are convolution layers and fully-connected layers.Specifically,the corresponding layer is optimized according to the parameter scale of the layer.When the parameter scale of the layer is larger than a certain threshold value,it means that the transmission cost is considerable and the layer needs to be optimized.For a convolution layer,an adaptive transmission rate is set according to the parameter scale of the layer.Only the gradients of some important convolution kernels are selected to be transmitted to the parameter server,and the gradients of the remaining convolution kernels are accumulated locally to the next round of parameter update.For a fully-connected layer,a threshold is adaptively set according to the overall size of all gradient values in this layer.This threshold is used to quantize and compress all the gradients represented originally by32-bit floating-point numbers into 2 bits before transmitting them to the parameter server.The parameter server decompresses the compressed gradients to recover them.The gradient bias caused by quantization is accumulated locally to the next round of parameter update.The above method is implemented on MXNet,a typical distributed deep learning system.Some representative data-sets(such as CIFAR10,CIFAR100 andImage Net-1K(ILSVRC 2012))are used for experimental evaluation.Results show that Hylo brings significant accelerations to distributed deep learning training tasks,and the losses of accuracies are less than 0.5%.Compared with existing methods,Hylo can increase deep learning training speeds up to 30%.
Keywords/Search Tags:Distributed deep learning, Communication optimization, Convolution layer, Fully-connected layer
PDF Full Text Request
Related items