Novel Abstractions for Data Center Network Management

Posted on:2017-03-15

Degree:Ph.D

Type:Thesis

University:The University of Wisconsin - Madison

Candidate:Gember-Jacobson, Aaron

Full Text:PDF

GTID:2448390005478329

Subject:Computer Science

Abstract/Summary:

Data center failures have become increasingly problematic due to the plethora of critical web and storage services hosted in today's data centers. Frequently, the problem lies in the data center network, which is prone to both functional and performance failures caused by hardware or software faults, misconfiguration, overload, or other issues with links and devices.;Preventing such failures is challenging, because data center network operators lack a formal understanding of how their design and operational decisions impact the frequency of network problems. Furthermore, current frameworks for verifying and maintaining the functionality and performance of data center networks are incomplete and/or inefficient. Consequently, this thesis explores how to analyze an organization's network management practices and efficiently guarantee that a data center network functions correctly and offers reasonable performance amidst changes in infrastructure, configuration, and workload.;We first present the design of a management plane analytics (MPA) framework which uncovers the relationships between network management practices and the frequency of network problems. By applying MPA to over 850 data center networks operated by a large online service provider, we identify several practices that strongly impact the frequency of problems in these networks, including: the number of control plane configuration changes and the number of device types (i.e., the presence of middleboxes).;Armed with this information, we explore how to design abstractions that aid in ensuring the correct and performant operation of a data center's control plane and middleboxes. We introduce an abstract representation for control planes that efficiently models a data center network's forwarding behavior under all possible link/device failure scenarios. This allows us to verify important functional invariants---e.g., traffic between subnets S1 and S2 always traverses a middlebox---three to five orders of magnitude faster than current verification tools. Additionally, we introduce a middlebox state management framework that allows network operators to realize a "one-big-middlebox" abstraction and avoid middlebox-induced functional and performance failures in the presence of hardware/software faults or overload. Our framework guarantees the safety and consistency of transferred/replicated middlebox state with minimal latency and resource overhead.

Keywords/Search Tags:

Data center, Management, Failures

Related items

1	Multi-Layer Fault Tolerance Techniques for High Reliability and Performance: Devices, Systems and Data Centers
2	Data Audit Management Center System And The Key Technologies
3	Web-based Campus Data Center Management Study
4	Research On Knowledge Management Optimization Of Operation And Maintenance Department Of Bureau Z's Data Center
5	Design And Implementation Of Data Center System Based On Master Data Management
6	An Empirical Study On The Construction Of Archival Data Center Based On The User Perspective
7	Analysis of data center cooling strategies and the impact of the dynamic thermal management on the data center efficiency
8	Design And Implementaion Of Call Center Platform Web Service
9	Design And Implementation Of 3D Based Data Center Facility Management System
10	Design And Implementation Of 3d Based Data Center Facility Management System