Font Size: a A A

Congestion Control For RDMA-based Datacenter Networks

Posted on:2023-12-18Degree:MasterType:Thesis
Country:ChinaCandidate:H R WeiFull Text:PDF
GTID:2558306914462864Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the large-scale deployment of Remote Direct Memory Access(RDMA)networking technology in data centers,distributed applications can transfer data with very low CPU overhead on high-performance networks with high throughput(200Gbps)and ultra-low latency(less than 10us per hop).However,as distributed applications scale up,the DCQCN used by commercial NICs is no longer effective in relieving congestion caused by large Incast loads.Although packet loss can be avoided by using Priority-based Flow Control(PFC)mechanisms,PFC can cause congestion spreading and head of blocking.Therefore,new congestion control algorithms are needed to achieve the goals of low latency,high bandwidth,and high reliability for high-speed networks.Most of the existing RDMA congestion control algorithms are implemented directly on hardware NICs.In this study,a softwareimplemented congestion control mechanism,SECC,is proposed for nonprogrammable hardware NICs in data centers.SECC implements a congestion control module in the RDMA network library and designs a window-based congestion control mechanism.SECC exploits the characteristics of RDMA semantics to achieve transparent,low overhead rate control for upper-layer applications through zero-copy fragment and assembly.Experimental results show that SECC can effectively enhance the NIC’s ability to hold Incast traffic without replacing the NIC.Turning on SECC can reduce the average completion time by 18.8%and the tail completion time by 79.5%in the Incast workload task.In addition,we find that the distributed training possesses incast traffic will represent by a certain periodicity.Based on this characteristic,we introduce RECC,an end-to-end congestion control scheme base on RTT and ECN.RECC integrates two indicators,ECN and RTT,as the congestion control signal.RECC quantitatively measures the degree of congestion by sending RTT probes and uses the ECN-base scheme to improve the response rate to congestion.Meanwhile,RECC can effectively identify intermittent congestion by measuring the real transmission rate at the transmitter side,avoiding excessive speed increase during low load,and improving the convergence speed of the congestion control algorithm.We implement RECC on a new generation of Mellanox hardware NICs without the need for additional network function support.Both large-scale simulations and real traffic verification show that RECC can effectively reduce network congestion and improve flow completion time.In the experiments of this study,RECC reduces the flow completion time by up to 34.8%and 95%of the PFC magnitude compared to DCQCN under the condition of the lossless network and simulated distributed training traffic.
Keywords/Search Tags:RDMA, Congestion Control, Datacenter Networks
PDF Full Text Request
Related items