Font Size: a A A

A General Systematic Network Troubleshooting Framework For Data Centers

Posted on:2020-04-22Degree:MasterType:Thesis
Country:ChinaCandidate:D X SunFull Text:PDF
GTID:2428330575452563Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,Internet services have increasingly changed the way people live.The data center networks,as key infrastructures for carrying online services including network search,social media,online commerce,etc.Network outages can possibly cause performance degradation or even services unavailable,as well as inestimable losses.Therefore,for users and enterprises,the need of efficient troubleshooting and quick service recovery has always been urgent.As network technology continues to flourish,the real production networks are be-coming more and more complex,while the innovation of network troubleshooting tools is seriously lagging behind.When a network goes down,network administrators have only a rudimentary set of tools at their disposal to track down the root causes of the out-age,such as traceroute,ping,SNMP,NetFlow.Humans are involved almost every time something goes wrong because of the limited functions these tools provided,which significantly delays the entire troubleshooting process,causing huge and unnecessary losses.As a distributed system,the overall behavior of the network is mainly determined by the forwarding state distributed across all network devices.The logic that man-ages this state,called the control plane and comprises multiple network applications,all changing the forwarding state simultaneously in complex,distributed,and unpre-dictable ways,making network troubleshooting never a simple task.In large-scale complex networks such as DCN,troubleshooting is no doubt much more difficult.For addressing the above problems,in this paper,we make full use of the ob-servation that network bugs usually manifest themselves as errant packet behavior in the dataplane.Based on this observation,we design and implement a general system-atic network troubleshooting framework.By tracking the forwarding state changes,which is considered as a bridge between the error symptoms and related traffic,in the dataplane in real time,and so that we can convert network troubleshooting into anal-ysis of specific traffic.Along with flexible query mechanism and customized feature extensions,we provide a seamless integrated network troubleshooting solution under existing hardware limitations,which effectively reduces the difficulty of troubleshoot-ing while significantly improving the efficiency.We demonstrate the benefits of our framework with the following contributions.First,we propose a network troubleshooting abstraction,which associates error symp-toms with suspicious traffic,and also invent a corresponding query language to express the observed error symptoms and pose questions to the network.Second,we present the design and implementation of a network monitoring platform for monitoring net-work events and collect the necessary information to support the abstraction,and ex-pose some well-designed API interfaces for potential application extensions.Third,we present the design and implementation of an interactive troubleshooting applica-tion built on top of the platform,which provides an easy-to-use interactive network debugging features,leveraging the information provided by the platform and process-ing capabilities.Fourth,we conduct a detailed evaluation through experiments to test and verify the scalability and performance of our prototype.Fifth,we introduce se-lective mirroring,distinguishing storage strategies for healthy and abnormal network traffic,and active probe to address the bandwidth and storage challenges and bring the advantages and flexibility of active measurement,which further expand and enrich the troubleshooting features of the system and thereby improve its troubleshooting capa-bilities and deployment prospect.
Keywords/Search Tags:Data Center Network, Seamless Troubleshooting, Systematic Framework
PDF Full Text Request
Related items