Font Size: a A A

A Study Of Communication Optimization Algorithms On Deep Neural Networks

Posted on:2020-01-08Degree:MasterType:Thesis
Country:ChinaCandidate:Z ZhangFull Text:PDF
GTID:2428330599461772Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Deep neural networks have been successfully applied in the fields of image processing,machine translation and speech recognition.In the face of increasing data volume,the distributed training deep neural network model is an effective solution.However,there are still several problems in distributed training.First,in terms of system architecture,the current mainstream architecture is the parameter server architecture.It does not distinguish the computing nodes according to the characteristics of different layers in deep neural networks,resulting in excessive communication overhead.Second,in terms of communication data compression,the current mainstream method is gradient sparsification.The communication complexity of this method is too high,and the gradient values after sparsification are still large,which increases the communication overhead.Aiming at the shortcomings of the current deep neural network in distributed training,Hourglass architecture and Sparse Gradient Compression Algorithm are proposed to reduce the communication overhead of distributed training deep neural network from two aspects: system architecture and communication data compression.They speed up the training process and ensure the accuracy loss is within 1%.In the Hourglass architecture,allocating the calculation of fully connected(FC)layers and convolutional(CONV)layers into different computing workers.The majority of workers in the cluster participate in the calculation of CONV layers.The rest of workers are used to calculate the FC layers.The Hourglass architecture takes full advantage of the machines' computing power,making the parameters and gradients of FC layers communicated among FC workers instead of the overall cluster for reducing the entire traffic.Sparse Gradient Compression includes hierarchical gradient sparsification,quantization and communication delay.This technology specifically includes:(1)Hierarchical gradient sparsification algorithm reduces the communication complexity to O(8)7)2)9))(n is the number of computing nodes,m is the transmission time required for each byte size message)for the problem of high communication complexity in the existing research work.(2)Gradient quantization algorithm quantizes the sparse gradients to 2-bit values.(3)Communication delay algorithm makes each computing node calculate more parameter updates by performing multiple iterations of the stochastic gradient descent algorithm.The experimental results for image classification,language model and speech recognition on the CIFAR-10,ImageNet,PTB and LibriSpeech datasets show the effectiveness of the Hourglass architecture and Sparse Gradient Compression.In the multiple datasets and deep neural network models,compared with the state of the art results for different tasks,the Hourglass architecture and Sparse Gradient Compression have improved the training speed by about 2 to 15 times and 2 to 8 times for the compression ratio of communication data,while ensuring accuracy loss is within 1%.
Keywords/Search Tags:Distributed computation, Deep neural networks, Communication optimization, Gradient compression, Communication delay
PDF Full Text Request
Related items