Font Size: a A A

Communication Optimization Of Distributed Deep Learning Based On Gradient Priority

Posted on:2021-10-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y L XiangFull Text:PDF
GTID:2518306107968849Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The advent of deep learning technology has made human life more and more convenient.It uses a deep structure with many parameters to achieve high precision of the model.With the rapid development of information,the data generated by humans has also grown explosively.A single node cannot complete the training of massive data in a short time,so distributed deep learning has also emerged.The parameter server is a communication topology widely used in distributed deep learning.It divides the cluster into working nodes and server nodes.The working nodes need to perform calculations and communicate with the server nodes.The server node is responsible for receiving the parameters of the working node and the Perform polymerization.Because the calculation and communication are serial in the working node,there is insufficient utilization of resources.Aiming at the problem of insufficient utilization of resources in the parameter-based server,strategies for overlapping calculation and communication based on non-preemptive priority and preemptive priority are proposed,and further optimization is made on the strategy of overlapping calculation and communication.In a strategy based on gradient priority,the gradients calculated by backpropagation are given different priority levels.The lower the number of network layers,the higher the priority.Then,when performing gradient push,the gradient will be pushed according to the priority level To the server node,the low-level gradient with high priority can be pushed to the server node for aggregation earlier,and then the next round of iterative calculation can be performed earlier.When the gradient is pushed according to priority,it can be divided into two strategies: non-preemptive and preemptive interruption.The non-preemptive strategy can only continue to push the next gradient after the current echelon is pushed.The preemptive interruption strategy can be The priority gradient interrupts the low priority gradient to push the high priority gradient to the server node for aggregation more quickly.Aiming at the two different communication strategies proposed,the open source distributed deep learning library Big DL was modified to verify the effectiveness of the two communication strategies.Using three common deep learning models to conduct experiments,and analyzing the experimental results in cluster scalability and execution time,it can be seen that the gradient priority-based strategy compares the default in cluster scalability and execution time.The strategy has been significantly improved.
Keywords/Search Tags:Distributed deep learning, Parameter server, Priority scheduling, Preemptive interrupt
PDF Full Text Request
Related items