Research And Implementation Of Efficient Parameter Communication Technology In Distributed Deep Learning System

Posted on:2020-03-21

Degree:Master

Type:Thesis

Country:China

Candidate:X T Chen

Full Text:PDF

GTID:2428330611493214

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

In recent years,the artificial intelligence technology based on neural networks has been widely used and developed in academia and industry.As the model size of neural networks and the amount of data required for training increase rapidly,it becomes more difficult to train neural networks in a single machine.The distributed training for neural networks not only greatly shorten the training time,but also provides a solution for some neural networks that cannot be trained in a single machine.In the foreseeable future,the distributed training for neural networks is an inevitable choice.How to improve the effi-ciency and scalability of distributed training for neural networks is particularly important.This paper proposed a low-precision distributed update(LPDU)algorithm,which converts the original floating-point gradient into low-precision data format for transmis-sion,and reduces the overhead of synchronization and improves the efficiency of dis-tributed training.The Mixed precision update(MPU)algorithm can ensure the training accuracy.By analyzing the overhead in each part of LPDU algorithm and the parameter size of the neural network,we obtained the theoretical performance of LPDU algorithm in specific neural network training,and verified it by experiments.According to the training accuracy curves between the LPDU algorithm and the original update algorithm,it proves that the LPDU algorithm can reach to the same accuracy as the original update algorithm in image classification and object detection tasks.And the efficiency of distributed train-ing has been improved.In the case of 8 nodes training,the efficiency of Resnet50 has increased from 84.05%to 87.15%.The training efficiency of the VGG has increased to 86.55%,whose original accuracy is 79.42%.And the SSD network has increased by 4.83%compared with the original efficiency.Based on the idea of LPDU algorithm that reduces the mantissa of gradient,and the different requirements for the precision of the gradient between classification networks and object detection networks,three extreme precision gradient compression(EPGC)al-gorithms are proposed:9-bits gradient compression algorithm and 8-bits gradient com-pression algorithm for classification networks training,11-bits gradient compression algo-rithm for object detection networks training.The 9-bits gradient compression algorithm removes all bits of mantissa in the floating point data format,only using 1 bit of sign and 8 bits of exponent to represent the gradient.the 8-bits gradient compression algorithm re-moves 8 bits of mantissa in the half-precision data.using 1 bit of sign,5 bits of exponent and 2 bits of mantissa represents the gradient.And the 11-bits gradient compression algo-rithm removes 5 bits of the mantissa in the half-precision data.using 1 bit of sign,5 bits of exponent and 5 bits of mantissa represents the gradient.In order to quickly verify the feasibility of the three proposed gradient compression algorithms,this paper simulates the three compression algorithms by zeroing the specific bits of mantissa based on the floating point number or half-precision floating point number.Experiments show that the three gradient compression algorithms can guarantee the training accuracy of neural net-works in classification or object detection tasks.It is feasible to improve the efficiency and scalability of distributed training through these three gradient compression algorithms.

Keywords/Search Tags:

neural network, distributed training, low precision, update algorithm, gradient compression

PDF Full Text Request

Related items

1	Research On Distributed Optimization Method Of Neural Network In Edge Computing
2	Optimizations For Data Path In Parallel And Distributed Neural Network Training
3	The Application Of A Multi-layers Pre-training Convolutional Neural Network In Image Recognition
4	Research On Convolutional Neural Network For Compression Algorithm
5	Convergence Analysis Of Gradient Algorithm For Training Some High-Order Neural Networks
6	Research On Deep Network Compression Method Based On Model Gradient Information
7	Convergence Analysis Of Gradient Algorithms For Training Higher-Order Neural Networks
8	Research On Distributed Stochastic Gradient Descent Algorithm
9	On The Learning And Compression Of Deep Neural Network Structure
10	Design Of Energy-Efficient RNN Accelerator Based On Network Compression And Voltage-Precision Scaling