Font Size: a A A

Automated fault-injection-based dependability analysis of distributed computer systems

Posted on:2002-04-29Degree:Ph.DType:Dissertation
University:University of Illinois at Urbana-ChampaignCandidate:Stott, David ThomasFull Text:PDF
GTID:1468390011498916Subject:Computer Science
Abstract/Summary:
As we become more dependent on the reliable operation of critical computer systems, the need to test these systems becomes increasingly important. Though several tools have been proposed for using fault injection to conduct dependability analyses, limitations in these tools make them inadequate for some important areas of analysis. One such area is comparison studies (or dependability benchmarking); another is analyzing complex, distributed computer systems. This research examines how fault injection can be applied in these areas without developing a custom tool for each new target system or fault model. The contribution of this research has two main aspects. The first is the design and development of a tool, NFTAPE (Networked Fault Tolerance and Performance Evaluator), which present a novel approach to developing fault injectors. The second is the numerous studies conducted using NFTAPE and their results.; NFTAPE is different from other fault-injection tools in that it separates the components that inject faults or trigger faults (called Lightweight Fault Injectors or Lightweight Triggers) from the rest of the tool. Though this is a simple change, it substantially improves our ability to analyze systems without spending considerable amounts of resources to customize new tools for each new system or experiment. It separates the system-specific components (LWFI or LWT) from portable components (NFTAPE's common control mechanism) allowing the control mechanism to be reused for each experiment. It also provides extensibility by allowing new LWFI and LWT to be added to support new fault models (which are often specific to the target system). NFTAPE also includes an API to facilitate writing LWFIs and LWTs and a scripting language to write campaign specifications. Two of the many NFTAPE studies include a comparison study between Voltan and Chameleon (two reliable middleware produces) and an analysis of a computer system for running distributed scientific applications in space.
Keywords/Search Tags:Computer, System, Fault, Distributed, NFTAPE, Dependability
Related items