Interconnection Network Architecture For Large-scale Manycore Processors And Its Performance Analysis

Posted on:2013-08-11

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Q Y Feng

Full Text:PDF

GTID:1268330422474257

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

High performance multi-core, or even manycore processors are the enablingtechnology for future Exascale computing era. To efficiently exploit the unprecedentedparallelism of these cores and further boost the throughput of manycore systems, it isimportant to provide a high-bandwidth, low-latency, low-power and highly scaleablechip-scale interconnection infrastructure. Recently, the challenge of manycoreprocessors has gradually shifted from logic design to interconnects; on-chip inter-corecommunication and processor-to-memory interconnects have become the bottleneck forsystem improvement. The advances of3D integration technology and silicon photonicdevices provide new opportunities for manycore interconnects design.In this thesis, aiming at the manycore interconnect design challenges, we proposenew interconnection network solutions for both inter-core and processor-to-memorycommunication by exploiting the advantages of3D integration and silicon photonics.We also develop analytical models to study the performance of these new architecturesusing network calculus. The main contributions are summarized as follows.(1) A three dimensional flattened butterfly network for on-chip inter-corecommutationRecent studies show that inter-core messages have stringent demand ontransmission delays as most of them are small control packets, e.g. cache-coherentmessages. Transmission delays will get much worse when more cores are integrated, forexample,1000cores. Although low-radix topologies, e.g. the popular2D mesh, are easyto place and route, they are unable to meet the latency budget of large-scale manycoresystem, as the transmission hops of low-radix networks increase proportionally withcores. Therefore, we propose a three dimensional flattened butterfly network forinter-core communication in large-scale manycore systems by exploiting the advantagesof3D integration technology. We overcome the routing challenges of area-hungryhigh-radix routers and global long wires in flattened butterfly using3D stacking andsuccessfully embed it into multiple stacking layers by forming the problem as an integerlinear programming model. A three dimensional flattened butterfly is very efficient forfast inter-core message transfer, because it not only employs the express one-hopvertical interconnects, but also provides additional links besides the connectivity of2Dmesh. Thus, as proved by our simulation results, the new scheme can greatly reduceinter-core message delays and boost the performance of manycore processors.(2) A photonic-burst switched memory access network for large-scale manycoreprocessorsProcessor-to-memory schemes are vital for manycore system since tardy memoryaccess will limit the performance of parallel computing cores. Memory bandwidth demand increases proportionally with the number of integrated cores. As projected byITRS, traditional electric IOs are unable to provide enough bandwidth for large-scalemanycore system due to stringent power budget. Therefore, we propose ahigh-bandwidth, low-power optical memory access scheme for manycoreprocessor-to-DRAM communication by exploiting the advantages of3D integrationtechnology and silicon photonic devices. Our photonic burst-switched (PBS) scheme isan adaptation of optical burst switching for chip-scale network using silicon photonicdevices. The PBS network meets the enormous bandwidth demand and stringent energyconstraints by using high-speed low-power CMOS-compatible photonic devices.Furthermore, it has higher bandwidth utilization than previous wavelength-routedschemes and optical circuit-switched memory access networks because ofsub-wavelength optical switching. We examine the system feasibility and performancesusing physically-accurate network-level simulation environment. We evaluate thearchitecture using synthetic traffic patterns and real workloads traces. Simulation resultsshow that our scheme achieves considerable energy savings, compared to opticalcircuit-switched memory access network and traditional electric IO schemes.(3) A new method to reduce control-plane congestion in chip-scale OBS networkIn current OBS optical networks, many control-plane operations, such as sharedresources arbitration and link management, are usually performed in the electric domainbecause of the absence of optical buffer devices and optical logic devices. Due to therandom nature of burst arrivals at core nodes, control-plane congestion can occur in anOBS network when the short-term arrival rate of headers at a core-node exceeds themaximum rate at which they can be processed. The problem gets even worse inchip-scale OBS, since1) chip-scale OBS network is characterized by massive shortbursts (fine-grained control messages, like memory read/write requests) that havestringent requirements on communication delay;2) the operation frequency ofchip-scale OBS network is constrained by thermal constraint and limited power budget,and therefore can not be very high. All these features definitely intensify thecontrol-plane congestion. Thus, we propose a new approach to address the control-planecongestion problem in chip-scale OBS using traffic regulations. Before being injected,every concurrent control flow is globally regulated and coordinated so that theaggregated flows do not exceed the header processing capacity of intermediate corenode, leading to the alleviation of control-plane congestion. In other words, ourregulation method provides some end-to-end bandwidth guarantees for each flow,resulting in significant reduction of burst losses. To select optimal regulator parameters,we formulate the regulation method into an optimization problem. Simulation resultswith both real application traces and synthetic flows show that our approach caneffectively resolve the control-plane congestion and achieve considerable performanceimprovements in terms of network delay and burst losses rate. (4) Resources dimensioning and performance analysis of chip-scale opticalnetwork using stochastic network calculusThe design of chip-scale optical network is characterized by challenging trade-offsamong latency, throughput, energy consumption, and silicon area requirements. Thesearchitectural parameters directly influence system performance. Thus, it is very usefulto perform such analysis in early stages of design so as to avoid bottleneck and reducedesign risks. So we develop analytical models to study chip-scale OBS network. Usingstochastic network calculus, we propose an analytical model of the ingress node todimension buffer size and calculate end-to-end latency; we also develop a “virtualwavelength buffer” model to estimate the required wavelength number with respect to atolerable burst loss probability. Analytical performance bounds on buffer size and delayare computed and compared with simulations. The simulation results verify that thetightness of the bounds is good. Using these stochastic network calculus models, we canfast evaluate the interconnect architecture parameters including buffer size, transmissiondelay and wavelength requirement. Our analytical models accurately depict therelationship between system performance and network architectures, so they are veryuseful for locating system bottlenecks, resulting in fast convergence of the complexdesign space.In summary, we investigate the manycore interconnect bottleneck and propose newinterconnection network architectures for large-scale manycore processors; we alsobuild analytical performance models for the new interconnect schemes using networkcalculus. We contribute new solutions towards the manycore communication problemand further extend the application field of network calculus theory. Our works havetheir academic and practical value on promoting the advancement of high performanceprocessors.

Keywords/Search Tags:

Computer Architecture, Interconnection Network, ManycoreProcessors, Networks-on-chip, Chip-scale Optical Network, Performance Analysis, Network Calculus

PDF Full Text Request

Related items

1	Studies On Key Technologies Of Networks-on-Chip Interconnection For Very Large Scale Chip-multiprocessors
2	The Research On Switching Architecture And Performance Of Network On Chip In High Performance Computer
3	Calculus Models And Performance Analysis For Networks-on-chip
4	Low-power On-Chip Networks In High-Performance Multi-Core Processors
5	Design and analysis of on-chip interconnection network for multi-processor System-on-Chip
6	Research On Bufferless Optical Interconnection Networks For High Performance Computer
7	Research On New High-radix Interconnect Chip Architecture And Key Technologies For Large-scale High-performance Computing
8	Optical Network On Chip Architecture And Control
9	Performance Analysis Of Network On Chip Based On Stochastic Network Calculus
10	High-performance, Scalable Optical Network-on-chip Architecture Designs