Research On Parameter Communication Optimization For Distributed Machine Learning System

Posted on:2024-03-13

Degree:Master

Type:Thesis

Country:China

Candidate:P R Xu

Full Text:PDF

GTID:2568307103974639

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Deep learning,has become one of the core research contents in the field of AI,and AI applications are developing rapidly,such as automatic driving,face recognition,etc.However,with the increasing size of machine learning data sets and deep learning models,it usually takes several weeks or more for a single machine to train machine learning models,which can no longer adapt to the big data environment.Therefore,it is necessary to distribute traditional machine learning and distribute work on multiple nodes to speed up the training speed of machine learning.At present,parameter server systems are the implementation of most distributed machine learning systems.When parameter server is used for distributed machine learning training,communication between nodes is required to ensure the correctness of the model.The traditional distributed machine learning communication model is the Bulk Synchronous Parallel communication model.This model uses a synchronization barrier.After the work node reaches the synchronization barrier,the training will be suspended until all the work nodes reach the synchronization barrier,and then a global synchronization will be performed.This model makes the slowest working node in training slow down the computing efficiency of the whole cluster.To solve this problem,industry and academia have proposed Asynchronous Synchronous Parallel model and delayed synchronous parallel model.The asynchronous parallel model no longer designs the synchronization barrier to make full use of the cluster performance,but this makes excessive use of the fault-tolerance of machine learning,which will eventually lead to the non-convergence of the model.The delay-synchronization parallel model specifies the iteration difference between the fastest training working node and the slowest training working node through the delay threshold.However,the delaysynchronization parallel model also does not fully consider various factors in the cluster environment and cannot adapt to the real cluster environment.Aiming at the problems existing in the mainstream parameter communication model of distributed machine learning,this thesis proposes an efficient synchronous parallel model and a predicted synchronous parallel model,and implements the distributed machine learning framework Kanel,and completes the implementation of efficient synchronous parallel model and predictive synchronous parallel model on Kanel.The research content of this thesis is as follows:(1)This thesis deeply analyzes the problems of the parameter communication model of the current mainstream distributed machine learning model,and proposes a new distributed communication model called efficient synchronous parallel model.The model optimizes the communication delay based on the optimization of the communication time between the server and the working node.This model enables the working node to judge whether the communication with the parameter server is efficient.If the working node thinks that the communication with the parameter server is efficient,the working node will communicate with the parameter server,otherwise the working node will give up the communication and directly carry out the next round of local training.After the design and implementation of the efficient synchronous parallel model,it is demonstrated theoretically that the efficient synchronous parallel model has correct convergence and can ensure the accuracy of the distributed machine learning model.Finally,this paper uses experiments to verify that the Efficient Synchronous Parallel has better performance than the Bulk Synchronous Parallel,Asynchronous Synchronous Parallel and Stale Synchronous Parallel model.(2)Aiming at the huge synchronization delay problem of current mainstream parameter communication models,this paper proposes a Predicted Synchronization Parallel.The Predicted Synchronization Parallel calculates the performance of each work node through the training time of each work node in the previous iteration,so as to predict the training situation of the next iteration cluster and find the optimal synchronization time to minimize the synchronization waiting time of nodes.In order to further improve the computing efficiency of the working node,in the Predicted Synchronization Parallel,the fast node will also use the local model and local data set for machine learning training when waiting for synchronization until it receives the new global model parameters.This thesis discusses the design and implementation of Predicted Synchronization Parallel.The results show that the model can further reduce the synchronization time.The experiment of Predicted Synchronization Parallel proves that the Predicted Synchronization Parallel can accelerate the machine learning training under the premise of ensuring the convergence of the model.

Keywords/Search Tags:

Distributed Machine Learning, Parameter Server, Communication Optimization, Efficient Synchronous Parallel

PDF Full Text Request

Related items

1	Communication Optimization Technology For Distributed Machine Learning Framework
2	Communication Dynamic Optimizing Technology For Distributed Machine Learning
3	Research On Key Technologies For Training Efficiency Optimization Of Distributed Machine Learning Over WAN
4	Communication Efficient Distributed Parallel Stochastic Optimization Algorithms
5	On Network Optimization Technology For MXNet-based Large-scale Distributed Machine Learning
6	Research On Data Parallel Communication Strategy For Distributed Machine Learning System
7	Research On Optimizing The Training Efficiency Of Distributed Deep Learning For Heterogeneous GPUs
8	Communication Optimization Technique For Distributed Synchronous Data Parallel Training
9	Research On Dynamic Scheduling Method For Distributed Machine Learning Tasks
10	Research On Parallel Optimization Methods For Image Recognition