Font Size: a A A

Network Update In Datacenter Network

Posted on:2021-11-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:T QuFull Text:PDF
GTID:1488306548992099Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
In the era of information economy,information management needs to rely on the latest achievements of computer technology,network and communication technology to interconnect all kinds of distributed information resources,so as to realize the overall optimization and scale advantage of the system.Data center network is one of the most important network technologies at present.It connects a large number of computing and storage servers to meet the needs of high-speed computing and mass storage in a more economical and convenient way.However,the change of internal network topology,the upgrade of switches,the migration of virtual machines,and the failure of switches and links in data center network will cause the update of data center network,that is,the change of traffic transmission path.First,Migration to different destination network states has different effects on the network;Second,global traffic rescheduling can optimize the network throughput to accommodate more traffic,but also lead to the migration of more existing traffic within the network.Third,there are a large number of concurrent data streams in the network and these data streams may compete for the same link resources,which will also aggravate the difficulty of traffic rescheduling.Furthermore,scheduling in streams does not take into account the nature of events;Fourth,the sequential scheduling of update events in the queue may cause queue head event blocking,which leads to the increase of average event completion time.Finally,packets lost in the link failure still need to rely on the end host's retransmission mechanism to retransmit.However,the delay caused by this retransmission mechanism may lead to the interruption of network update,and more importantly,the completion time of the short stream with delay sensitivity will rise sharply,thus affecting the service quality of the upper application.In this paper,five problems of data plane design of destination state selection,minimum cost migration network update,event-based efficient network update,eventbased delayed network update and link failure network update are studied,and the research results are as follows.This paper presents for the first time the selection of optimal destination state for network update.For the data center network,there are multiple paths between a pair of source and destination addresses,so the destination state after network update will have a variety of possibilities.Since the migration sequences generated by different destination states have different effects on the network,that is,the packet loss rate and delay jitter of the network are not the same,therefore,the optimization of destination network state is crucial to reduce the impact of network update on network performance.To this end,we first generate all possible candidate goal state,and then puts forward a update strategy,by observing the candidate from initial state to goal state of the network in the process of the transformation of the influence of network to determine the best state,found that the state transition in the process of migration flow minimum state the purpose of minimal impact on the network.We describe a typical network update problem as a rescheduling problem for a group of streams due to link bandwidth limitation.In this regard,we propose two update mechanisms,Lupdate and Lupdate-S,to minimize traffic migration during network update.The basic idea is to locally schedule each new stream onto a shortest path at the cost of least migrating the existing background stream.Since the migration of one background stream may not meet the link bandwidth requirements of the new stream,the Lupdate-S allows multiple background streams to be migrated to meet the bandwidth requirements of the new stream.We conducted large-scale data-driven evaluation under the widely used fat tree data center network and ER random network.Experimental results show that even in the case of a high proportion of highly utilized links,our method can achieve non-congestion network update with as little traffic migration as possible.In this paper,we extend the network update based on the flow level to the event level to reflect the characteristics of the event.For the traffic from different update events,we optimized the update cost and event completion time(ECT)of each update event at the level of network update events.In this regard,we use approximate algorithm to optimize the update cost of update events,and two effective methods LMTF and P-LMTF are proposed.Among them,LMTF preferentially schedules events with low update cost to reduce average ECT,and P-LMTF,on this basis,USES opportunity update method to find events that can be executed simultaneously with the team leader event to improve the average and tail ECT of update events and ensure fairness.Data-driven experiments show that when network utilization exceeds 70%,compared with the FIFO method,the average ECT of the P-LMTF method is reduced by 75% and the tail ECT is reduced by42%.In order to solve the problem of head event blocking in update event queue,we design a delayed update mechanism.We propose two partially delayed update methods,PDU and PDUN,to remove the blocking state and improve the update efficiency.The PDU policy first schedules events according to the arrival order of events to ensure fairness.At the same time,flows that are not immediately scheduled in the queue leader event are skipped,providing more execution opportunities for subsequent update events.Given that most events cannot be completed because only a small fraction of flows is delayed,PDUN reorders delayed events based on the number of remaining flows of update events to speed up the execution of these update events.The experimental results show that,compared with the FIFO strategy,when the network utilization fluctuates between 30% and 80%,the average ECT reduction of the update events of PDUN strategy is 80% ? 90%.In this paper,we are the first to propose and implement an in-network packet loss recovery mechanism for link failure.Due to the significant delay for a host-end to retransmit the lost packets during link failure,we expect the network to take responsibility for recovering the lost packets lost.After the occurrence of link failure,we no longer need to rely on the sender to retransmit data,but directly retransmit from the network device to ensure continuous packet transmission and increase the reliability of the network,so as to eliminate the impact of packet loss caused by the link/switch failure and the impact on the completion time of the delay-sensitive application.We propose a Shared queue ring(SQR)that completely eliminates packet loss during link failure and enables seamless switching to a backup path for continuous traffic transmission.We implement SQR on a Barefoot Tofino switch using the P4 programming language.Experimental results on the hardware test bench show that for delay sensitive workloads,SQR can completely mask link failures from endpoint transmissions,thereby reducing tail FCT by up to four orders of magnitude.
Keywords/Search Tags:Datacenter Networks, Software Defined Network, Network Update, Programmable Data Plane, Link Failure Recovery
PDF Full Text Request
Related items