Font Size: a A A

Research On Data Extraction And Communication Optimization For Distributed Deep Learning

Posted on:2020-02-17Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhuFull Text:PDF
GTID:2428330590458356Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
With the widening and deepening of neural network models in deep learning and the increase in the size of datasets,traditional single machine training can no longer meet people's needs.In order to achieve efficient training,distributed deep learning came into being.At the same time,the high communication overhead between physical machines in distributed deep learning also brings new challenges.In order to solve the problem of low training efficiency caused by high communication overhead,researchers have proposed a variety of methods to improve communication efficiency.Based on the characteristics of gradient parameters,an optimized transmission method based on quantization and compression is proposed.Based on the rule that large gradient values are conducive to convergence,if those gradients with larger absolute values are transmitted,the communication efficiency of distributed deep learning can be greatly improved.In the Fixed-Exponential Compression(FEC)algorithm,three optimization strategies are mainly used: First,the gradient parameter is filtered according to the specified exponential threshold,and the portion greater than or equal to the threshold are the gradients to be transmitted,and this step can increase the sparsity of the gradients.Second,for the filtered gradient values,each 32-bit floating point number is compressed to 5 bits using a 5-bit quantization algorithm.In order to make the quantization calculation more efficient,multi-threaded parallel computing is used to realize 5-bit,which greatly improves the efficiency of quantization.At the same time,the 5 bits of the 5-bit quantized output are divided into 1 bit and 4 bits,respectively,to improve space utilization.Third,due to step one,the sparsity of the gradient parameter is increased,and a zeros compression algorithm is performed on successive zero values,further reducing the parameters transmitted.The performance of FEC was tested in the cluster using the MXNet system,and four datasets of MNIST,CIFAR10,CIFAR100,and ImageNet1 K were trained using various neural networks.The experimental results compare the baseline and the 2Bit compression method in MXNet.In terms of gradient compression ratio,the FEC method achieves a compression ratio of 16.3 to 23.3 times,which is higher than that of 2Bit.In terms of communication efficiency,the FEC method accelerates the training by 1.18 to 3.34 times than the baseline.At the same time,the convergence of the FEC is basically matches the baseline.Compared to the 2Bit method,the stability and accuracy of convergence are better.
Keywords/Search Tags:Distributed deep learning, Data compression, Distributed communication, Parameter server
PDF Full Text Request
Related items