Font Size: a A A

Research On Congestion Control In Data-center Network

Posted on:2024-09-27Degree:MasterType:Thesis
Country:ChinaCandidate:Z C LiuFull Text:PDF
GTID:2568306944967359Subject:Communication Engineering (including broadband network, mobile communication, etc.) (Professional Degree)
Abstract/Summary:PDF Full Text Request
With the continuous improvement of data processing,storage and transmission requirements of large language model training,large-scale machine learning,heterogeneous computing,resource decomposition and other applications,traditional TCP has been unable to meet the requirements of high throughput and low latency transmission in the data center network,and RDMA(Remote Direct Memory Access)technology came into being.RDMA greatly improves transmission efficiency by allowing the receiver and sender network cards to directly access pre registered memory buffers.However,low latency and high throughput environments have increasingly high requirements for congestion control,resulting in various RDMA support algorithms.Among them,DCQCN is currently the standard congestion control protocol for RoCE(RDMA over Converged Ethernet).This article studies the congestion control technology of data center networks based on RDMA transmission,and the main tasks completed include:1.Researched the problems in the rate update mechanism of DCQCN and designed a parameter adaptive speed regulation method:addressed a series of problems in the DCQCN algorithm,including complex parameter tuning,fixed timer threshold making it difficult to cope with large-scale Incast events,fixed byte counter threshold causing fairness issues,and fixed step size causing difficulty in convergence.Design adaptive parameters,including determining the adaptive byte counter and timer during the growth phase,as well as the adaptive growth step size during the additive growth process.Based on the NS3 network simulator,a data center network simulation environment was built to simulate the topology,traffic characteristics,and Incast scenarios of the data center network.The experimental results showed that adaptive parameters can improve the performance of the DCQCN algorithm in dealing with large-scale burst traffic,ensure fairness and convergence,and simplify complex parameter configurations under different network conditions.2.A new set of feedback network state information has been designed for the HPCC algorithm:The congestion feedback information of the HPCC algorithm is the inflight bytes on the link.By calculating and estimating the inflight bytes and comparing them with BDP(bandwidth delay product),it is determined whether congestion has occurred and adjusted through a window.This article analyzes existing congestion control algorithms and finds that the different congestion feedback information of the congestion control algorithm determines the difference in performance.Therefore,based on the high sensitivity of queue gradient changes to congestion in the TIMELY algorithm,the queue gradient is considered on the basis of HPCC inflight bytes,and two types of feedback information are accurately measured using INT(In band Network Telemetry),and the two are fused.The improved algorithm can maintain a shorter switch queue while not losing network throughput.Based on its fast response time,the improved algorithm is also suitable for dynamic network environments and burst traffic patterns.
Keywords/Search Tags:datacenter network, congestion control, RDMA
PDF Full Text Request
Related items