Font Size: a A A

Research On Compile Techniques Of Fault Tolerance For Soft Errors

Posted on:2011-02-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:J J XuFull Text:PDF
GTID:1118330332486963Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
From the birth date of computer, reliability has been one of the most con-cerned issues in computer science domain. Today, though the reliability of moderncomputer has been improved significantly, the increasingly complicated computermanufacturing technics and the continuously expanded computer application areacause the reliability of computer systems always faces many new challenges.Soft errors are a kind of transient fault phenomenon in semiconductor circuit,which are caused by external radiation or electrical noises, such as high energy neu-trons from cosmic rays, power glitches, electromagnetic interference and etc. Softerrors introduced by the radiation of cosmic rays, e.g. single event upsets, alwaysa?ect the reliability of space computers. Moreover, with the continuously increasingperformance enabled by the scaling of VLSI technologies, modern microprocessorsare becoming more susceptible to soft errors. Subsequently to the wall of perfor-mance and power consumption, the dependability of computing, caused by softerrors, has emerged as a growing concern. Since Register Files (RFs) are accessedvery frequently and can not be well protected, soft errors occurring in them are oneof the critical reasons for a?ecting program reliability.Comparing with the hardware-implemented fault tolerance for soft errors, thesoftware-implemented methods are attractive because of their advantage on costsand ?exibility. For addressing the soft errors occurred in RFs, this dissertationfocuses on the techniques about program analysis, error detection, compiler opti-mization and etc. The main work is divided into the following five parts:1. It is valuable for analyzing the impact of soft errors occurred in RFs from theperspective of program, which is the foundation for implementing e?cient faulttolerance technologies. Based on the assembly codes, this dissertation proposesa static approach, named ASER, which is able to analyze the soft errors ef-fect quantitatively for the reliability of a given program. Based on a previousstatic method, ASER calculate the living probability of registers according tothe inter-procedural analysis framework of summary functions, resulting in theimprovement of final accuracy. Then, the concrete live ranges of registers are sketched via a graph reachability method. Analytical experiments show thatthe reliability of a program has a connection with its native structure. More-over, the critical factor of all involved live ranges have been presented, whichidentify the vulnerabilities of a program under the occurrence of soft errors inRFs. These contributions are in favor of implementing the e?cient algorithmsfor tolerating soft errors.2. ECC coding is one of the most powerful and popular architectural error pro-tection mechanisms for mitigating soft errors. But it is di?cult to fully protectRFs using ECC because of the significant penalty in power, area and possiblyperformance. This dissertation assumes that the register file is only partiallyprotected by ECC, and presents a register reassignment method, named RAPP.Firstly, the register interference graph is constructed according to the analyti-cal result about registers'live ranges from ASER. Then through a hierarchicalgraph coloring algorithm, the ECC protected registers are assigned to the mostcritical live ranges of registers. Comparing with other available partial pro-tected methodologies, experimental results show that RAPP improve programreliability significantly and take into account the power overhead.3. To address the data ?ow errors caused by soft errors, the instruction-level du-plication techniques have been used widely for their advantage on ?exible andgeneral implementation with strong capacity for error detection. However, theconsiderable consistency check instructions are the fundamental limitation forprogram performance. This dissertation presents a checkpoint optimizationmethod for instruction duplication, named COID. Based on the data ?ow analy-sis for error propagation, this method try to remove the redundancy comparisoninstructions under the boundaries of system call instructions without a?ectingthe error detection rate. To illustrate the e?ectiveness of this method, we per-form several fault injection experiments and performance evaluations on a setof simple benchmark programs. Experimental results indicate that COID hasimproved the average performance of instruction duplication for 12.78% withoutdegrading the error detection rate.4. Control ?ow errors are a major e?ect incurred by soft errors. Current avail-able control ?ow checking methods have deficiency in performance overhead and checking capacity. Through the control ?ow graph of program, basic blocksare firstly categorized by the graph coloring algorithm. Then, an e?ective con-trol ?ow checking method, named ECCFS, is presented based on the formattedsignature of basic blocks. Moreover, the extend solutions are proposed for thecontrol ?ow checking of intra-block and inter-procedure, respectively. The ana-lytical result of checking capacity and the experimental result of fault injectionindicate that ECCFS can detect most control ?ow errors. Compared with thetypical control ?ow checking methods, ECCFS has the advantage in the errorsdetecting rate and the performance overhead.5. Currently, a variety of methodologies have been proposed to address the e?ectsof soft errors. Unfortunately, these techniques will incur performance penalty,storage overhead and economical costs in di?erent degree. For enhancing theruntime reliability of program without extra costs, the dissertation presentsa compiler optimization method, named SISER. Its basic idea is to decreasethe total susceptible intervals that may be a?ected by soft errors during theexecution process through re-arranging the code execution ?ow. Based on theanalytical results of ASER, the detailed algorithm of basic block scheduling ispresented in the fashion of dynamic programming. Experimental results indicatethat the average reliability of programs have been improved about 4.41%. SISERdoes not provoke extra palpable overhead, which is its outstanding characteristiccomparing with other traditional methodologies of fault tolerance.
Keywords/Search Tags:Soft Errors in Register Files, Software Fault Tolerance, Pro-gram Analysis, Instruction Duplication, Compile Optimization
PDF Full Text Request
Related items