Font Size: a A A

Parameter Servers Placement For Distributed Training

Posted on:2022-11-19Degree:MasterType:Thesis
Country:ChinaCandidate:J Q YanFull Text:PDF
GTID:2518306782452494Subject:Investment
Abstract/Summary:PDF Full Text Request
With the development of machine learning and deep learnin,traditional centralized model training can no longer meet the needs of the industry for larger model parameter scale and higher training efficiency.Distributed training can support model training with large scale parameters and accelerate the model training process by improve the utility of multiple machines computation resources.The placement strategy of parameter server is one of the important factors affecting the training time of distributed deep learning.In this thesis,the parameter server placement problem of edge computing nodes under the dynamic change of available storage capacity is studied for multi-parameter server architecture.Parameter server is mainly responsible for uploading and downloading model parameters and gradients of edge nodes.Because the network environment of edge nodes and its computing and storage capacity change with time in real time,choosing appropriate edge nodes as parameter servers is beneficial to reduce the distributed training time,and abnormal nodes need to be dealt with in the training process to ensure the normal execution of training.After simplifying the constraints and variables of the proposed problem,the similarity relation between the proposed problem and the asymmetric K-center problem is established theoretically,and the NP hardness of the proposed problem is proved.In view of the above distributed training,the whole training process is divided into two stages: initial placement stage and continuous training stage.For the initial placement phase,an approximation algorithm and a random rounding algorithm are proposed to solve the initial static parameter server placement problem.The approximation algorithm based on approximation algorithm of asymmetric K center,select an edge nodes as global parameter server,and then determine the local parameter server by solve the integer programming which is relaxed and converted the above problem to the integer programming and Expand Front,repeat choose other edge nodes as global parameters server until the traverse all the edge nodes,Finally,the shortest training strategy is selected as the final parameter server placement strategy in this stage.The random rouding algorithm converts the problem to a linear program and solves with random rounding.In the continuous training phase,an adjustment algorithm is proposed to adjust the placement strategy of parameter server after each epoch,considering the real-time change of available storage space of edge nodes,so as to meet the stability of training and reduce the training time of the next epoch.In this thesis,the distributed network environment is simulated locally by Docker virtualization technology,and common models and data sets of deep learning are adopted as training tasks.This thesis analyzes the performance of the parameter server placement algorithm based on asymmetric K center and the existing service deployment algorithm under different environmental factors.The simulation results show that the approximation algorithm and random rounding algorithm proposed in this thesis are superior to the existing algorithms in all cases under various experimental settings.Meanwhile,under the same experimental settings,the training time of the global model of the proposed approximation algorithm is very close to the training time of the optimal solution generated by the brute force method.And the two algorithms proposed in this thesis are obviously better than brute force method in algorithm running time.In the continuous training phase,when the available storage capacity changes,the performance of the proposed algorithm is better than the existing algorithms.
Keywords/Search Tags:Distributed deep learning, parameter server placement, approximation algorithm, NP problem
PDF Full Text Request
Related items