Font Size: a A A

Routing and Topology Reconfiguration for Networks-on-Chip's Runtime Health

Posted on:2015-07-30Degree:Ph.DType:Thesis
University:University of MichiganCandidate:Parikh, RiteshFull Text:PDF
GTID:2478390017993612Subject:Computer Engineering
Abstract/Summary:
As silicon technology evolves, chip multi-processor (CMP) and system-on-chip (SoC) designs are dramatically changing from limited, robust and homogeneous logic blocks to integrating billions of fragile transistors into complex and heterogeneous IPs. This increased integration has compelled architects to design resource-heavy, complex and power-hungry on-chip interconnects, moving towards network-on-chip (NoC) structures. In addition, the waning reliability of silicon poses a great threat to these communication structures as they could potentially be a single point of failure. Further, the heterogeneity and fast time-to-market of upcoming computers makes it nearly impossible to thoroughly verify NoC architectures and optimize them for power at design-time. Failure of NoC architectures to meet correctness, reliability and power-budget requirements has detrimental effects on the runtime operation of NoC-based CMPs and SoCs. Therefore, runtime detection and reconfiguration mechanisms are becoming a key requisite to unlock the full potential of future CMPs and SoCs. Such mechanisms can overcome both functional bugs that escaped design-time verification and device failures due to an unreliable silicon substrate. Similarly, runtime reconfiguration solutions can also be leveraged to minimize power dissipation in NoCs.;The solutions proposed in this thesis address challenges to NoCs' runtime health by employing a reactive approach, i.e., error detection followed by recovery. Further, they provide integrated detection and recovery from errors. To attain temporal error isolation, an application's execution is partitioned into fixed-time monitoring windows, during which distributed checkers, at each NoC router, monitor the traffic activity to detect anomalous behavior. If a failure is detected, a reconfiguration procedure is triggered at epoch boundaries to circumvent it. The solutions are designed to be passive, lightweight and independent of the baseline design. In addition, the design complexity is kept at a minimum and the area overhead is within 5% for all the solutions with respect to a baseline NoC. In a nutshell, this thesis provides low-cost NoC-specific solutions that enable: correct behavior by avoiding functional bugs, reliable execution by circumventing faults and power-aware operation by averting overheating. The work presented in the thesis will enable designers to aggressively push scalability and time-to-market limits with respect to NoC design.
Keywords/Search Tags:Runtime, Noc, Reconfiguration
Related items