Font Size: a A A

Network Improvement and Evaluation in Datacenters and Chip-Multiprocessor

Posted on:2018-06-08Degree:Ph.DType:Dissertation
University:University of Toronto (Canada)Candidate:Dai, WenboFull Text:PDF
GTID:1448390002997072Subject:Computer Engineering
Abstract/Summary:
Datacenters are vitally important in the information era in which we currently live. Datacenters are the key enabler of cloud computing, and host many important applications including web service and machine learning. To deliver superior application performance, it is imperative to carefully design and optimize networks in datacenters. we improve the network design in datacenters at two granularities. At the high level, we optimize datacenter networks to better handle collective communication and network failures. At a lower level, we propose sampled approaches to enable fast and accurate evaluation for Network-on-Chips (NoCs) that are widely used in Chip-Multiprocessors (CMPs).;Current datacenters rely on IP multicast and TCP incast to support collective communication, or one-to-many and many-to-one traffic. However, they struggle to meet the latency, throughput and scalability requirements posed by modern datacenters. We present McLink and rMcLink, an efficient and scalable collective communication solution. In McLink, we distribute multicast packets in a table-free, tree-based manner. rMcLink aggregates incast packets on-the-fly as they travel towards their destination, reducing network bandwidth consumed by incast traffic. Our simulation results show that McLink reduces multicast packet delay by 16%, and rMcLink improves incast TCP flow finish time by 82%.;Datacenter networks are subject to failures. In this work, we design a fault-tolerant routing protocol, GatewayPath. We propose a graph-partitioning based algorithm to forward packets around failures. It also allows further optimization to be applied towards shortest path or load balance. Our evaluation shows that GatewayPath improves convergence time by more than 80%; meanwhile the packet delay is on par with that of a traditional link-state fault-tolerant routing protocol.;For CMPs used in datacenter servers, full system simulations are widely used to evaluate their performance. Full system simulations are accurate but prohibitively slow, limiting the range and depth of design space exploration. We focus on accelerating NoC simulation via the use of sampling techniques. We propose NoCLabs and NoCPoint, two sampling methodologies utilizing statistical sampling theory and traffic phase behavior, respectively. Experimental results show that NoCLabs and NoCPoint estimate NoC performance with an average error of 7% while achieving one order of magnitude speedup.
Keywords/Search Tags:Datacenters, Network, Evaluation
Related items