Font Size: a A A

Cross-layer fault-tolerant design and analysis for high manufacturing yield and system reliability

Posted on:2017-02-03Degree:Ph.DType:Thesis
University:University of CincinnatiCandidate:Guo, JianghaoFull Text:PDF
GTID:2458390008461664Subject:Computer Engineering
Abstract/Summary:
This research tries to solve a daunting problem we face today called reliability challenge. With the IC fabrication technology advances to 10nm and 7nm technology node, transistor scaling and voltage scaling make it is almost impossible to build 100% reliable electronic devices like 10 or 15 years ago. The reality is we have to make chips work even though they are not perfectly built. From our understanding, one possible way to solve this problem is to use cross-layer fault tolerance which distributes fault-tolerant tasks into different system levels. Different techniques are used to solve part of the problem at each layer, and this approach can greatly improve the overall system reliability.;In this research, we propose a three-layer fault tolerant architecture to handle permanent defects in multicore CPU chips. We divide the CPU chip into several system layers based on the natural working tiers. At gate level, we partially select some gates for duplication based on the importance of each gate. Going up to the second level, which is called micro-architecture level, we add a special structure called spare cache to handle some defects escaped from the gate-level fault tolerance. On the top architecture level, we make use of the multicore system to do instruction migration and thread migration to handle all remaining defects from the first two levels.;We select the instruction decoding pipeline stage in CPU as an example to evaluate the effectiveness of the proposed fault tolerant architecture. Many innovative ideas, e.g., enhanced instruction predecoding and spare cache swapping are proposed in this thesis to reduce the impact to the system performance. Benchmark program simulations with GEM5 have been used to measure the performance overhead of the new system. From the hardware implementation and simulation results, we are assured that the proposed multi-layer fault tolerant architecture is a low hardware cost and high system performance technique which can be widely used in modern multicore CPU design.
Keywords/Search Tags:System, CPU, Fault
Related items