Optimization Of Multicast Routing Based On In-Network Computing For Distributed Model Training Under Parametric Server Architecture

Posted on:2024-07-19

Degree:Master

Type:Thesis

Country:China

Candidate:H F Zhan

Full Text:PDF

GTID:2568306932962009

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the continuous development of internet technology,the amount of data is showing exponential growth.Big data has facilitated the distributed model training of data parallelism,and reducing the enormous communication costs of large-scale distributed model training is currently a hot research topic.Currently,some efforts attempt to reduce communication costs through gradient compression and model pruning.Since the emergence of programmable switches with in-network computing capabilities,such as the Intel Tofino switch,a method called in-network aggregation has successfully reduced the communication costs of parameter aggregation stages and alleviated bandwidth bottlenecks in clusters.However,the parameter update stage is equally important in distributed model training,and existing solutions have never considered reducing the communication costs of the parameter update stage.To address this issue,we have mainly done the following two pieces of work.Firstly,we have considered the in-network computing capabilities of programmable switches and constructed multicast trees for parameter update in distributed model training.Under the constraints of programmable switch resources,we propose a parameter update optimization algorithm called INMNT,based on random rounding,which reduces redundant data transmission during the parameter update process and thereby reduces cluster bandwidth consumption.The implemented approximation ratio of the algorithm is O(log |V|),where V is the number of programmable switches.We conducted small-scale experiments on programmable switches,demonstrating the contribution of our algorithm in accelerating communication in distributed model training.Additionally,to validate the algorithm’s performance,we conducted large-scale simulation experiments.By simulating the algorithm,we conducted large-scale experiments,and the results show that our solution can reduce downstream communication costs by 14.5%to 35.8%compared to the state-of-the-art solution.Secondly,since data centers often run massive tasks,building a multicast tree for each parameter update session could potentially result in resource constraints or violations in the network.Therefore,we investigate the multicast and reconfiguration problems for parameter update sessions with capacity constraints on links and programmable switches in a multi-task scenario,with the goal of maximizing training throughput.Firstly,we construct multiple multicast trees for each session based on various resource utilization scenarios.We consider optimizing multicast tree placement in multi-task scenarios and take into account offline and online scenarios as complementary situations.For offline scenarios,we propose an efficient algorithm called KMMT,which has an approximate ratio of O(1/(logn/α+1)),where α≥1 is a function of the minimum link/programmable switch capacity and maximum multicast demand,and n is the number of devices in the network.For online scenarios,we have developed a method called OMSM with a competitive ratio of O(logn/α+1),which balances network throughput and continuity of existing traffic.Additionally,we have devised a multicast reconfiguration method to prevent congestion in multicast reconfiguration,which has the same asymptotic throughput guarantee as the KMMT algorithm.Extensive simulations have shown significant performance improvements of our algorithms compared to previous multicast algorithms.

Keywords/Search Tags:

In-Network Computing, Distributed Model Training, Multicast Transmission, Software Defined Networking

PDF Full Text Request

Related items

1	Software Defined Multimedia Multicast System And Transmission Strategy
2	Study Of Resource Efficient Multicast Transmission In Software-Defined Networks
3	Uncertain Multicast Routing In Software-Defined Networking
4	Multicast In Software Defined Networking Constrain ED By Flow Table
5	Research On Streaming Media QoS Multicast Routing In Software-Defined Networking
6	Data Flow Scheduling Strategy In Software Defined Networking
7	Research And Implementation Of Controllable Multicast Based On Software Defined Network
8	Research On Mulitcast Routing Problem In Software Defined Networking
9	Research On Business Transmission Optimization Of Software-defined Networking
10	Software-defined Networking-based Distributed Monitoring And Diagnosis Mechanisms And Simulation Implementations