With the emergence and vigorous development of high-performance services such as distributed machine learning training and cloud data storage,applications have increasingly stringent requirements on network throughput,end-to-end latency and CPU overhead at end-hosts.This promotes the landing and evolution of high-performance transport protocols represented by RoCE(RDMA over Converged Ethernet)in data centers.However,to support the expansion and iterative upgrade of upperlayer services,the link speed of high-speed lossless networks continues to increase,and the network scale continues to expand.As a result,the frequency of network congestion increases significantly,which seriously affects the throughput and end-to-end latency of applications.Although existing solutions have improved the design of the native RoCE transport protocol,their performance is far from optimal,or involves substantial modification of network equipment.Therefore,how to enhance the transmission protocol to adapt to the ever-evolving network infrastructure and realize the continuous evolution of the protocol;how to achieve a wider range of optimal control with less resource overhead;and how to provide the greatest support for potential emerging scenarios are key issues still to be explored in high-speed lossless data center networks.This thesis studies the congestion control problem in high-speed lossless data center networks and has the following contributions:1.An explicit rate-matching-based fast congestion feedback mechanism,NetCC,is proposed.Weighing the pros and cons of hop-byhop flow control and end-to-end congestion control mechanism,this thesis designs NetCC to achieve fast and accurate congestion feedback without any modifications to the RNIC,thereby effectively improving the host’s ability to handle network congestion.Hardware experiments and software simulations show that NetCC can achieve end-to-end performance comparable to the current optimal HPCC,and can reduce the average FCT of DCQCN and TIMELY by up to 50%and 70%,respectively.2.A cybernetics-based RDMA fast congestion feedback scheme(PACC)is designed.From the perspective of system modeling and classical control theory,this thesis enhances the generation mechanism of congestion signal at the switch based on the PI controller,thus effectively improving the stability domain and dynamic adaptability of the system.Theoretical analysis,testbed experiments and large-scale simulations show that PACC can effectively reduce the overhead of congestion feedback,achieve a fast,accurate and fair response to congestion,and have a 6-69%improvement in average FCT compared with the existing algorithms. |