Font Size: a A A

Research On The Key Techniques Of Soft Error Tolerance Design On Multi Core Microprocessor

Posted on:2009-04-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:R GongFull Text:PDF
GTID:1118360278456592Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
One of the most critical challenges in modern microprocessor design is the transient fault caused by high-energy particles or random noise. These transient faults may cause soft errors, even failures, which can affect the reliability of microprocessors. With the development of integrated circuit, the transient fault rate of a single microprocessor keeps increasing exponentially with the exponential increase of transistors per chip. Multi core microprocessors become the mainstream in the last few years. Generally speaking, the soft error tolerance design needs some kind of redundancy. Therefor, the redundant cores in multi core microprocessor provide a potential solution for soft error tolerance design. And how to efficiently use the redundant resources in multi core microprocessor to enhance the reliability becomes the research focus in recent years.This thesis details our researches on some key techniques of soft error tolerance design on multi core microprocessor. Firstly, our researches focus on the soft error tolerant execution model, which is the key technique to exploit the redundant resources in multi core microprocessors for soft error tolerance design. Secondly, we research some hardened techniques on gate and architecture level. These hardened techniques provide soft error masking, detection or recovery. Finally, we research the reliability evaluation model of microprocessors so that the evaluation results can be used to conduct the design process of high reliable microprocessors.The primary innovative works in this thesis are list as follows.(I) Two soft error tolerant execution models on multi core microprocessor are proposed. (1) The dual core redundancy execution model based on context saving and recovery (DCR) executes two copies of a given program on different cores. The soft errors can be recovered with low inter-core FIFO bandwidth demand and low implementation complexity by enhancing the cores with context saving and recovery. (2) The reconfigurable triple core redundancy execution model (TCR) executes three threads of a program on different cores. Once detecting a soft error, the execution model can be reconfigured to mask the failed core. Thus the soft error can be masked with low inter-core FIFO bandwidth demand and high execution performance.(II) Two gate level redundancy structure based on asynchronous circuit are proposed. (1) The dual modular redundancy based on C element (DMR) uses the asynchronous C element to mask the corrupted values in the dual redundant device. It can efficiently reduce the die area overheads, while provides the SEU tolerant ability. (2) The temporal spatial triple modular redundancy based on dual clock triggered register (TSTMR) can mask both SEU and SET faults. With the same explicitly separated master and slave latch structure as de-synchronous pipeline, dual clock triggered register (DCTREG) uses one clock for sample enable and another for output enable, thus the temporal redundancy can be implemented on gate level.(III) An enhanced control flow checking technique (ECFC) is proposed. This ECFC technique includes checking method and implementation method. (1) Checking method based on signature node and edge signs for both nodes and edges in control flow graph. It is a more powerful checking method. And it can eliminate the misjudgment of illegal branch and the conflict of adjusting signature in the typical checking method. (2) Control flow checking implementation method with compiler signatures and hardware checking inserts signature data in the code when compiling. Then the hardware checking operation is triggered by control flow switching instructions. This implementation method shows its advantages on small binary code size, high performance and real time checking.(IV) A reliability evaluation model based on die area and performance overheads are proposed. This model uses a novel reliability metric to evaluate the reliability of microprocessors veraciously and quantitatively. An evaluation framework is also proposed so that the reliability of different soft error tolerant techniques can be evaluated during the design flow and the evaluation results can be used to conduct the designer to choose appropriate techniques among various hardening methods. The aforementioned soft error tolerant execution models, gate level redundancy techniques and architecture level control flow checking technique have been evaluated using this evaluation model.This thesis explores the soft error tolerance design on multi core microprocessor by researching the soft error tolerant execution model, hardened techniques on gate and architecture levels and reliability evaluation model. The experimental results demonstrate that these models and techniques are effective and can be used in the design and implementation of soft error tolerant multi core microprocessors.
Keywords/Search Tags:Multi Core Microprocessor, Soft Error Tolerance, Execution Model, Gate Level Redundancy, Control Flow Checking, Reliability Evaluation
PDF Full Text Request
Related items