Font Size: a A A

Data Center Network Load Optimization Strategy And Research

Posted on:2022-11-10Degree:MasterType:Thesis
Country:ChinaCandidate:H TangFull Text:PDF
GTID:2518306776994619Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of data centers,both user experience and cloud service business have led to a whole new era.Traditional data centers use TCP/IP protocol stack,which will bring large latency and CPU load in data transmission due to the layers of packet encapsulation and parsing.This situation is especially evident at high bandwidths,hence the emergence of Remote Direct Memory Access(RDMA)technology.This technology is more suitable for today's data center networks because of its low latency,low CPU load and high throughput.However,the zero-replication and kernel bypass technologies used in RDMA technology can only reduce the load problem during packet encapsulation and parsing,but when the network is congested,a large number of packets will time out and retransmit,which will also bring serious load problems.Therefore,in addition to introducing RDMA to solve the high load problem caused by data transmission under high bandwidth,it is necessary to further optimize the load starting from flow control scheme and congestion control algorithm.The PFC mechanism is used to ensure that it works in a lossless network.However,due to the excessive granularity of the PFC mechanism and the single detection mechanism,traffic control can lead to head-of-line blocking(HOL)and unfairness problems,which can seriously lead to cascading suspension,resulting in a large number of packet timeouts and aggravating the load at the source.The DCQCN congestion control algorithm in the Ro CEv2 stack only focuses on how to guarantee network throughput,but ignores the problem of long queue length in the switch buffer caused by the fixed threshold,which also leads to packet timeout and extra overhead for the source under congestion.Therefore,this thesis will focus on further optimizing the load by improving the flow control scheme and congestion control algorithm after the introduction of RDMA technology,as follows.For the problem of overly aggressive PFC mechanism leading to HOL and cascade pause,an in-depth analysis is conducted.For the problem that the granularity of PFC mechanism is too large and the judgment mechanism is single,this thesis proposes a PFC mechanism based on a fuzzy comprehensive evaluation model,which evaluates the network congestion from multiple angles and indicators to find the flows that really affect the network congestion.These congested flows are then scheduled using a priority+time slice polling algorithm,and the scheduling is completed within O(1)time complexity with the help of a hash table.Compared with the traditional PFC mechanism,the improved scheme can effectively alleviate the HOL and cascading pause problems,and finally the effectiveness and rationality of the scheme is demonstrated using NS-3 simulation experiments.To address the problem that the large tail delay caused by the DCQCN algorithm ignoring the queue length can aggravate the load at the source end during congestion.By establishing a mathematical model to analyze the root cause of this situation,for the problem that DCQCN uses a fixed threshold and real-time queue length as detection means,this thesis proposes the DCQCNDMT algorithm,constructs a source-side rate model by combining the transmit rate change at the sender side,deeply analyzes the root cause affecting the queue length,and introduces a performance factor to consider the degree of network congestion,so as to The threshold value is dynamically adjusted to reduce the buffer backlog message queue length and tail delay while ensuring that the network throughput remains at a high level,which greatly alleviates the load problem at the source end under congestion.As demonstrated by NS-3 simulation experiments,the DCQCN-DMT algorithm further optimizes the load and effectively improves the network performance by reducing the switch buffer backlog queue length by 20% compared with the traditional DCQCN algorithm while keeping the throughput unchanged in high bandwidth scenarios.
Keywords/Search Tags:data center, Remote Direct Memory Access technology, PFC, fuzzy comprehensive evaluation model, congestion control
PDF Full Text Request
Related items