Communication Optimization Of Distributed Deep Learning System Based On Model Structure Characteristics

Posted on:2021-02-05

Degree:Master

Type:Thesis

Country:China

Candidate:J Peng

Full Text:PDF

GTID:2518306104488174

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

In distributed deep learning training,the synchronization of gradients and parameters usually brings huge network communication overhead.The existing communication optimization methods mainly include gradient compression through sparsification or quantization and some optimizations of communication modes.However,the existing gradient compression strategies generally use an identical method for all layers in a network or different networks,ignoring the characteristic difference between different layers in the network and the differences between disparate networks,which results in unsatisfactory optimization effect.Aiming at overcoming the shortcomings of the existing optimization methods,a hybrid communication optimization method named Hylo is proposed based on model structure characteristics.Two different strategies are used to perform gradient compression on two different types of layers,which are convolution layers and fully-connected layers.Specifically,the corresponding layer is optimized according to the parameter scale of the layer.When the parameter scale of the layer is larger than a certain threshold value,it means that the transmission cost is considerable and the layer needs to be optimized.For a convolution layer,an adaptive transmission rate is set according to the parameter scale of the layer.Only the gradients of some important convolution kernels are selected to be transmitted to the parameter server,and the gradients of the remaining convolution kernels are accumulated locally to the next round of parameter update.For a fully-connected layer,a threshold is adaptively set according to the overall size of all gradient values in this layer.This threshold is used to quantize and compress all the gradients represented originally by32-bit floating-point numbers into 2 bits before transmitting them to the parameter server.The parameter server decompresses the compressed gradients to recover them.The gradient bias caused by quantization is accumulated locally to the next round of parameter update.The above method is implemented on MXNet,a typical distributed deep learning system.Some representative data-sets(such as CIFAR10,CIFAR100 andImage Net-1K(ILSVRC 2012))are used for experimental evaluation.Results show that Hylo brings significant accelerations to distributed deep learning training tasks,and the losses of accuracies are less than 0.5%.Compared with existing methods,Hylo can increase deep learning training speeds up to 30%.

Keywords/Search Tags:

Distributed deep learning, Communication optimization, Convolution layer, Fully-connected layer

PDF Full Text Request

Related items

1	Research On Partial Information-Based Deen Learning Methods
2	Gaze Detection Based On Deep Neural Netword With Selective Fully Connected Layers
3	The Optimization Of Program Recognition Algorithm Based On Deep Learning
4	Research On Hardware Implementation Technology Of CNN Fully Connected Layers Based On FPGA
5	Research On The Optimization Strategy Of Model Parameters In Distributed Deep Learning
6	Research On Image Semantic Segmentation Based On Deep Learning
7	Simulation Research Of Wireless Communication Physical Layer Algorithm Based On Deep Learning Autoencoder
8	Research On Face Detection Based On Deep Learning
9	Prediction Of Non-VAT Owed Tax Based On Deep Learning In Large Enterprises
10	Multi-scale Pedestrian Real-time Detection Based On Convolutional Neural Network