Communication Optimization Of Distributed Deep Learning Based On Gradient Priority

Posted on:2021-10-12

Degree:Master

Type:Thesis

Country:China

Candidate:Y L Xiang

Full Text:PDF

GTID:2518306107968849

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

The advent of deep learning technology has made human life more and more convenient.It uses a deep structure with many parameters to achieve high precision of the model.With the rapid development of information,the data generated by humans has also grown explosively.A single node cannot complete the training of massive data in a short time,so distributed deep learning has also emerged.The parameter server is a communication topology widely used in distributed deep learning.It divides the cluster into working nodes and server nodes.The working nodes need to perform calculations and communicate with the server nodes.The server node is responsible for receiving the parameters of the working node and the Perform polymerization.Because the calculation and communication are serial in the working node,there is insufficient utilization of resources.Aiming at the problem of insufficient utilization of resources in the parameter-based server,strategies for overlapping calculation and communication based on non-preemptive priority and preemptive priority are proposed,and further optimization is made on the strategy of overlapping calculation and communication.In a strategy based on gradient priority,the gradients calculated by backpropagation are given different priority levels.The lower the number of network layers,the higher the priority.Then,when performing gradient push,the gradient will be pushed according to the priority level To the server node,the low-level gradient with high priority can be pushed to the server node for aggregation earlier,and then the next round of iterative calculation can be performed earlier.When the gradient is pushed according to priority,it can be divided into two strategies: non-preemptive and preemptive interruption.The non-preemptive strategy can only continue to push the next gradient after the current echelon is pushed.The preemptive interruption strategy can be The priority gradient interrupts the low priority gradient to push the high priority gradient to the server node for aggregation more quickly.Aiming at the two different communication strategies proposed,the open source distributed deep learning library Big DL was modified to verify the effectiveness of the two communication strategies.Using three common deep learning models to conduct experiments,and analyzing the experimental results in cluster scalability and execution time,it can be seen that the gradient priority-based strategy compares the default in cluster scalability and execution time.The strategy has been significantly improved.

Keywords/Search Tags:

Distributed deep learning, Parameter server, Priority scheduling, Preemptive interrupt

PDF Full Text Request

Related items

1	Design And Implementation Of A Verification Framework For Preemptive OS Kernels
2	Research On Parameter-exchanging Optimizing Mechanism In Distributed Deep Learning
3	Preemptive Scheduling For Multiple DAGs Worklfow In Cloud Computing
4	Research On Priority-based Preemptive Task Scheduling Mechanism In CPS
5	Preemptive Task Scheduling Algorithm Based On Availability In Heterogeneous Systems
6	The Research Of Differentiated Service Mechanism With Preemptive Priority Strategy In WMSNS
7	The Research Of Differentiated Service Mechanism With Preemptive Priority Strategy In Wmsns
8	Research On The Scheduling Technology Of Tasks In Real-Time Systems
9	Parameter Servers Placement For Distributed Training
10	Research On Scheduling Optimization Of Distributed Machine Learning System