Font Size: a A A

Distributed Training Optimization In Heterogeneous Clusters

Posted on:2021-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:H H YuFull Text:PDF
GTID:2428330602499099Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
In recent years,deep learning has been widely used in various fields such as image processing and natural language processing,and its success is drived by the common development of big data,algorithm model and computing power.In order to improve the universality of deep learning model,the continuous growth of data set and model size leads to the increase of training time and higher de-mand for computing power.Due to the limited computing resources of single machine,in order to increase the computing power and reduce the training time,distributed training has been performed on the cluster in recent years,and the training process on the original single machine has been distributed to multiple machines for parallel execution.The existing popular Deep learning system such as Tensorflow,MXNet have support distributed training,but they are only suitable for homogeneous clusters with similar computing performance of each node,while in actual production,there will be some heterogeneous clusters with different hardware performance of nodes or multi-job resource competition in a node.When performing distributed training in these heterogeneous clusters,different training nodes(workers)have different iteration speeds.If use the traditional bulk synchronous parallel(BSP)or asynchronous parallel(ASP)distributed SGD algorithm,the slow worker will slow down the overall training speed due to the effects of synchronous waiting or stale parameters.Therefore,this thesis will optimize the distributed training process in heterogeneous clusters.The main work of this thesis is as follows:(1)This paper analyzes the problems of BSP and ASP algorithms commonly used in existing distributed deep learning systems and several distributed SGD variants in heterogeneous clusters.On this basis,we propose an adaptive k-syn/asyn SGD algorithm based on worker runtime state.For synchronous and asynchronous algorithms,the timing of parameter update is flexibly controlled according to the iteration time and parameter version when the worker runs.In this way,we compromise the influence of slow worker caused by synchronous waiting and parameter obsolescence,maximize the system throughput and reduce the convergence time.(2)In order to further improve the efficiency of distributed training in hetero-geneous clusters,on the basis of adaptive k-syn/asyn SGD,this paper proposes a batch size allocation algorithm based on worker performance,which predicts The calculation speed of each worker through recurrent neural network GRU(Gated Recurrent Unit)and exponential smoothing and allocate corresponding batch size.Through this algorithm,we minimize the gap between worker iteration times,and eliminat the effect of slow workers fundamentally.(3)We use our methods to optimize Tensorflow,and then on homogeneous clusters?artificially simulated hardware heterogeneous clusters and heterogeneous clusters with multi-job resource competition,we test and analyze the distributed training performance of adaptive k-syn/asyn SGD algorithm and batch size alloca-tion algorithm,and the experimental results show that our method significantly improves the distributed training performance of the system in heterogeneous clusters compared with BSP and ASP.
Keywords/Search Tags:Distributed training, Heterogeneous cluster, Distributed SGD al-gorithm, Parameter version, Performance prediction
PDF Full Text Request
Related items