Font Size: a A A

Research On Congestion Control Technology Of HPC Interconnection Network

Posted on:2022-10-08Degree:MasterType:Thesis
Country:ChinaCandidate:T Y MaFull Text:PDF
GTID:2518306605990359Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the rapid development of cloud computing,big data,artificial intelligence and other information technologies,the amount of data carried by HPC systems has been growing explosively.Today’s large-scale HPC systems consists of tens of thousands or even hundreds of thousands of servers,network congestion caused by huge and frequent data interactions between various servers will cause a significant decline in HPC system.Congestion control strategy has become an important guarantee for maintaining the efficient and stable operation of HPC systems.Traditional congestion control strategies have the shortcuts of slow response to congestion,inaccurate rate control and poor deployability.Aiming at the above problems,this paper proposes a single-path congestion control scheme and a multi-path congestion control scheme based on the differentiation of congestion scenarios and the design ideas of the existing congestion control strategy,combined with the characteristics of the Dragonfly+ topology and the protocol stack of Infini Band.In order to solve the problems of slow response to congestion and inaccurate rate adjustment of the existing schemes,this thesis designs a single-path congestion control strategy based on the distinction of congestion scenarios.This solution applies the idea of "divide and conquer".By recording the throughput and the number of flows received at the same time of the receiver,and the queue length of the switch,the two congestion scenarios of receiver congestion and public path congestion in the network are distinguished.Sender adopts the corresponding rate adjustment mechanism for the above two congestion control scenarios according to the quantitative congestion information so as to quickly and accurately adjust the sending rate of the congested flow when congestion occurs.The simulation results show that compared with DCQCN and Timely,the FCT of the hot-spot traffic pattern is reduced by 44.8% and 39.9%,respectively,and with the adversarial traffic pattern,it is reduced by 29.8% and 26.5%.Single-path transmission cannot make full use of the path diversity of the network,so that the throughput of the network will decline under some traffic patterns.Although Applying multi-path transmission can solve this problem well,but it will cause congestion spreading when hot-spots appear in the network;in the Dragonfly+ network,the use of non-shortest path will also bring additional overhead;in addition,there is no congestion control strategy based on multi-path transmission in the Infini Band protocol stack yet.This paper proposes a multi-path congestion control strategy based on LMC path allocation.This solution uses the above-mentioned single-path congestion control scheme’s distinguishing mechanism for network congestion scenarios.It can distinguishes the shortest and non-shortest paths in the Dragonfly+ network through a specific LID allocation mechanism and choose the initial transmission path based on historical congestion information and different path weights,adjust the transmission rate of each sub-flow according to the congestion information.The simulation results show that compared with MP-RDMA and IB_CC,the FCT of the uniform traffic pattern is reduced by 50.5% and 16.7%,respectively.With the hot spot traffic pattern the FCT is reduced by 56.4% and 45.2%,and in adversarial traffic pattern the FCT is decreased by 9.3% and 26.2%.This solution makes full use of the advantages of multi-path transmission,and the throughput in adversarial traffic pattern is increased by 51.2% compared with single-path transmission.
Keywords/Search Tags:HPC, Interconnect Network, Congestion Control, Infini Band, Dragonfly+
PDF Full Text Request
Related items