Font Size: a A A

Dependability analysis of fault-tolerant multiprocessor architectures through simulated fault injection

Posted on:1994-08-27Degree:Ph.DType:Dissertation
University:University of Massachusetts AmherstCandidate:Clark, Jeffrey AlanFull Text:PDF
GTID:1478390014992447Subject:Engineering
Abstract/Summary:
This dissertation develops a new approach for evaluating the dependability of fault-tolerant computer systems. Dependability has traditionally been evaluated through combinatorial and Markov modeling. These analytical techniques have several limitations which can restrict their applicability. Simulation avoids many of the limitations, allowing for more precise representation of system attributes than feasible with analytical modeling. However, the computational demands of simulating a system in detail, at a low abstraction level, currently prohibit evaluation of high level dependability metrics such as reliability and availability. The new approach abstracts a system at the architectural level, and employs life testing through simulated fault-injection to accurately and efficiently measure dependability. The simulation models needed to implement this approach have been derived and integrated into a generalized software testbed called the REliable Architecture Characterization Tool (REACT).; The effectiveness of REACT is demonstrated through the analysis of several alternative fault-tolerant multiprocessor architectures. Specifically, two dependability tradeoffs associated with triple-modular redundant (TMR) systems are investigated. The first explores the reliability-performance tradeoff made by voting unidirectionally, instead of bidirectionally, on either memory read or write accesses. The second examines the reliability-cost tradeoff made by duplicating, rather than triplicating, memory modules and comparing their outputs via error detecting codes. Both studies show that in many cases, acceptably little reliability is sacrificed for potentially large performance increases or cost reductions, in comparison to the original TMR system design.
Keywords/Search Tags:Dependability, Fault-tolerant, System
Related items