Font Size: a A A

Towards a cross-layer framework for wearout monitoring and mitigation

Posted on:2013-04-18Degree:Ph.DType:Dissertation
University:University of Southern CaliforniaCandidate:Zandian, BardiaFull Text:PDF
GTID:1458390008488132Subject:Engineering
Abstract/Summary:
CMOS scaling has enabled greater degree of integration and higher performance but has the undesirable consequence of decreased circuit reliability due to rapid wearout. Accelerated processor wearout and the consequent degradation in lifetime have become a first order design constraint. This dissertation tackles these challenges by developing new tools to accurately quantify wearout, providing novel methods to quantify the wearout impact due to software interactions on hardware. The dissertation then demonstrates the usage of these tools by developing a wearout-aware scheduling approach that achieves wear leveling within a processor.;This dissertation first presents WearMon, an adaptive critical path monitoring architecture which provides accurate and real-time measure of a processor's wearout-induced timing margin degradation. Special test patterns are used to check a set of critical paths in the circuit-under-test. By activating the actual devices and signal paths used in normal operation of the chip, each test will capture up-to-date timing margin of these paths. This monitoring framework dynamically adapts testing interval and complexity based on analyses of prior test results, which increases efficiency and accuracy of monitoring. Monitoring overhead can be completely eliminated by scheduling tests only when the circuit is idle. This wearout detection mechanism is a key building block of a hierarchical runtime reliability management system where multiple wearout monitoring units can co-operatively engage preemptive error avoidance schemes. Our experimental results based on an FPGA implementation show that the proposed monitoring framework can be easily integrated into existing designs and operate with minimal overhead.;WearMon overhead can become a hurdle when a circuit block has a steep critical path timing wall. Many prior research studies intuitively argued that only a few paths within a steep critical path timing wall are actually utilized by application software. But there has been a dearth of tools that enable designers to understand how software impacts the utilization of critical paths in a circuit. The next part of this dissertation develops a tool for cross-layer analysis of wearout, called WAT. WAT uses FPGA emulation closely coupled with software simulation to provide accurate insight into device switching activity and runtime path utilization. We demonstrate the utility of WAT by providing accurate gate-level switching activity statistics as inputs to a lifetime wearout simulation tool. The switching activity statistics are used as inputs to the lifetime prediction tool which uses accurate device level models for the electrophysical phenomena causing wearout. Accurate switching statistics from WAT can significantly improve the lifetime prediction accuracy.;WAT is also used to address the concern regarding WearMon overhead in the presence of steep critical path timing walls. A new design-for-reliability approach is developed that reshapes a critical path wall to make a circuit more amenable for wearout monitoring. This design flow methodology uses path utilization profile to select only a few paths to be monitored for wearout. We propose and evaluate four novel algorithms for selecting paths to be monitored. These four approaches allow designers to select the best group of paths to be monitored under varying power, area and monitoring budget constraints.;Finally we demonstrate the impact of runtime wearout management in a proactive runtime wearout-aware scheduling approach, WAS. Processor failure can occur due to wearout of a single structure even if vast majority of the chip is still operational. WAS strives for uniform wearout of processor structures thereby preventing a single structure from becoming an early point of failure. The fine-grained microarchitectural level chip wearout control polices use feedback from a network of timing margin monitoring sensors to identify the most degraded structures. Our evaluation shows WAS can result in 15% to 30% improvement in lifespan of a multi-core processor chip with negligible performance and energy consumption impact.
Keywords/Search Tags:Wearout, Monitoring, Steep critical path timing, WAT, Processor, Circuit, Framework, Paths
Related items