Font Size: a A A

Fault-tolerant computing for radiation environments

Posted on:2002-03-30Degree:Ph.DType:Dissertation
University:Stanford UniversityCandidate:Shirvani, Philip PaymanFull Text:PDF
GTID:1468390011496210Subject:Engineering
Abstract/Summary:
Radiation, such as alpha particles and cosmic rays, can cause transient faults in electronic systems. Such faults cause errors called Single-Event Upsets (SEUs). SEUs are a major source of errors in electronics used in space applications. There is also a growing concern about SEUs at ground level for deep submicron technologies. In this dissertation, we compared different approaches to providing fault tolerance against radiation effects and developed new techniques for fault tolerance and radiation characterization of systems.; Estimating the SEU error rate of individual units of a digital circuit is very important in designing a fault-tolerant system. We developed a new software method that uses weighted test programs and multiple linear regression for SEU characterization of digital circuits. We also show how errors in bistables can be distinguished from errors in combinational logic by operating a sequential circuit at different clock frequencies.; Radiation hardening is a fault avoidance technique used for electronic components used in space. However, these components are expensive and lag behind today's commercial components in terms of performance. Using Commercial Off-The-Shelf (COTS) components, as opposed to radiation-hardened components, has been suggested for providing the higher computing power that is required for autonomous navigation and on-board data processing in space. We compared these two approaches in an actual space experiment. We collected errors from two processor boards, one radiation-hardened and one COTS, on board the ARGOS satellite. We designed and implemented software techniques for detecting, correcting and recovering from errors. We demonstrated that the reliability of COTS components can be enhanced by using software techniques without changing the hardware. Despite the 170% time overhead of the software techniques used on the COTS board, the throughput of the COTS board was an order of magnitude higher than that of the radiation-hardened board. The throughput of the radiation-hardened board would be the same as that of the COTS board if the radiation-hardened board had cache memory.; We also developed a new technique for tolerating permanent faults in cache memories. The main advantage of this technique is its low performance degradation even in the presence of a large number of faults.
Keywords/Search Tags:Fault, Radiation, COTS board, Errors
Related items