Font Size: a A A

Research On Efficient Communication Strategies For Distributed Machine Learning

Posted on:2023-07-07Degree:MasterType:Thesis
Country:ChinaCandidate:Z A RenFull Text:PDF
GTID:2568306908467324Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the development of big data and artificial intelligence,machine learning technology has been widely used in various fields of industry.In order to meet the needs of increasingly complex deep learning scenarios,the scale of machine learning models and their training datasets has gradually increased.It takes weeks or even months to complete training on servers equipped with single GPU.Time has hindered the research progress and applications of machine learning algorithms.Distributed training can effectively reduce the training time of machine learning models.Among them,data parallelism is a relatively common distributed machine learning training strategy in the industry.In this strategy,the dataset is divided into different computing nodes,and each node updates the model parameters through a specific synchronous communication algorithm,thereby speeding up the training.However,with the growth of cluster scale,the distributed data parallelism will face serious communication overhead problems.In order to realize the efficient training of distributed machine learning,this paper focuses on the synchronous communication algorithm in the data parallel strategy.The main research contents are as follows:Aiming at the problem that the communication time of the existing ring synchronization scheme is limited by the minimum bandwidth of links between GPUs in the cluster,this paper proposes a parameter division strategy based on unbalanced topology(UTGDS,Unbalanced-Topology Based Gradients Division Strategy).This strategy constructs a new gradient synchronization method through gradient division and the setting of horizontal and vertical communication groups,so that the links in the server and the links between the servers can be fully utilized;and deduce the optimal parameter segmentation value,so as to solve the problem of unbalanced topology bandwidth.Build two-card and eight-card server clusters,and build a distributed machine learning system on two clusters to test the acceleration effect of the algorithm.The experimental results show that the communication time of UTGDS in a high-bandwidth cluster is reduced by 2.7%-8.5%compared with the 2D-Tours algorithm;and the higher the cluster bandwidth,the more ideal the acceleration effect of the UTGDS algorithm.In order to reduce the possibility that the parameter server algorithm is prone to network congestion and reduce the amount of data transmission in the network,this paper proposes a hybrid synchronization algorithm N-MSA(Network-Based Mixed Synchronization Algorithm)based on in-network computing.Execute the aggregation of some parameters;then execute the parameter server algorithm between servers,and finally complete the coverage of parameters within the server.On this basis,the idea of in-network computing is introduced,and programmable devices are used to perform the functions of parameter aggregation and distribution.By adjusting the bandwidth of the cluster network card,the GPU server is used to play the role of a programmable switch,and the test is carried out on a two-card server cluster with different bandwidth modes.The experimental results show that when the cluster bandwidth is 1Gbps,the communication time of N-MSA is reduced by 85.4%and 61.5%compared with the parameter server algorithm and MSA respectively;after the bandwidth is upgraded to 10Gbps,the communication time of N-MSA is compared with the above two methods.The algorithm can reduce 61.3%and 35.2%respectively.
Keywords/Search Tags:distributed machine learning, data parallelism, synchronization algorithm, unbalanced topology, in-network computing
PDF Full Text Request
Related items