Scaling High-Performance Interconnect Architectures to Many-Core Systems

Posted on:2013-06-21

Degree:Ph.D

Type:Dissertation

University:University of Michigan

Candidate:Sewell, Korey LaMar

Full Text:PDF

GTID:1458390008484692

Subject:Engineering

Abstract/Summary:

The ever-increasing demand for performance scaling has made multi-core (2-8 cores) chips prevalent in today's computing systems and foreshadows the shift toward many-core (10s-100s of cores) chips in the near future. Although the potential performance gains from many-core systems remain appealing, the widespread adoption of these systems hinges on their ability to scale performance while simultaneously satisfying Quality-of-Service (QoS) and energy-efficiency constraints.;This work makes the case that the interconnect for these many-core systems has a significant impact on the aforementioned scalability issues. The impact of interconnects on many-core systems is illustrated by observing that the degree of the interconnect has a significant effect on system scalability and demonstrating that the architecture of high-radix, many-core systems are feasible, energy-efficient, and high-performance.;The feasibility of high-radix crossbars for many-core systems is first shown through a new circuit-level building block called the Swizzle-Switch. A 32nm Swizzle-Switch utilizes integrated arbitration techniques to provides an energy- and area-efficient switching element that improves the scalability of crossbars to a high radices. The Swizzle-Switch is shown to operate at frequencies up to 1.5GHz for 128-bit, radix-64 crossbars and also to have the ability to implement many arbitration policies such as Least-Recently Granted (LRG) and Round-Robin (RR). Results show that Swizzle-Switch's LRG arbitration policy reduces the worst-case request access latency by 1.83× and 2.03× on average over round robin and random arbitration schemes, respectively.;This work then shows how a many-core system called the Swizzle-Switch Network can use the Swizzle-Switch as the central building block for a flat crossbar interconnect. The Swizzle-Switch Network is shown to be advantageous to traditional Network-on-Chip (NoC) for systems up to 64 cores. The Swizzle-Switch Network improves system performance by 21%, reduces L1 on-chip average miss latency by 2.2×, and decreases the standard deviation of that L1 miss latency by 3.0× relative to a Mesh NoC topology. Additionally, all of these performance benefits are obtained while providing a 25% energy savings over the Mesh.;The Swizzle-Switch is also leveraged as a building block for high-radix NoC topologies that can support many-core architectures. The Swizzle-Switch-based Flattened Butterfly topology is demonstrated to provide a 15% speedup, 1.76× smaller L1 on-chip average miss latency, 2.5× reduction in miss latency standard deviation, and 10% energy savings over the Mesh topology.;Finally, the impact that 3D stacking technology has on many-core scalability is evaluated and shown to assist crossbar and bus interconnects in scaling past their traditional limitations. A 3D-optimized Swizzle-Switch Network is able to leverage frequency gains to achieve a 15-28% speedup over a 2D-Swizzle-Switch Network when using memory-intensive benchmarks. Additionally, a bus-based 64-core architecture is shown to provide an average speedup of 49× over a baseline uniprocessor system when using 3D technology.

Keywords/Search Tags:

System, Performance, Scaling, Interconnect, Shown, Over, Swizzle-switch, Miss latency

Related items

1	Study On Local Design Optimization Of High Performance DSP's On-Chip Storage System
2	Research In The Performance Prediction Model Under Different Frequencies
3	Large-scale Low Latency Switch Design
4	Design And Implementation Of Level One Cache Miss Pipelining On High Performance DSP
5	RF/CDMA interconnect for re-configurable VLSI systems
6	Design of advanced I/O interconnect circuits and systems in CMOS
7	Design And Implementation Of FC-AE Switch Interconnect Technology
8	Research On Architecture Of Optical Switch Array For High Performance Computer
9	Research On Multi-application High-performance Optical Interconnect Memory Access Architecture
10	System-Level Dynamic Thermal Management Key Techniques Research