Font Size: a A A

Robust integration of multi-level fault detection mechanisms and recovery mechanisms in a component-based support middleware model for fault-tolerant real-time distributed computing

Posted on:2010-05-08Degree:Ph.DType:Dissertation
University:University of California, IrvineCandidate:Zhou, QianFull Text:PDF
GTID:1448390002476976Subject:Engineering
Abstract/Summary:
Nowadays many modern applications demand a high degree of software reliability, service availability, and guaranteed timeliness for critical task executions. Due to the rapid growth in variety and functional complexity, the existing methods and tools for developing sizable fault tolerant (FT) real-time (RT) distributed computing (DC) applications have become insufficient.;The ROAFTS (Real-time Object-oriented Adaptive Fault Tolerance Support) middleware model has been evolving in the UCI DREAM Laboratory over the past decade as a reliable execution engine model for FT RT DC applications. ROAFTS integrates various mechanisms for fault detection and recovery in a form that meshes with high-level RT DC component-based programming schemes, in particular, the TMO (Time-triggered Message-triggered Object) programming scheme. ROAFTS is the first component-based support middleware model for FT RT DC. Previous ROAFTS model, however, has weaknesses in several important areas. Some major drawbacks are: (a) the incorporated network surveillance scheme does not consider network partitioning failures and network merging; (b) message transmission failure is not handled sufficiently rigorously; (c) the mechanism for collecting and processing failure reports from different parts of the system is weak and incomplete in that its completeness and correctness were not rigorously established.;The work reported in this dissertation is to establish a complete and robust middleware model by taking the previous ROAFTS as a starting point and enhancing and integrating multi-level fault detection and recovery mechanisms. The resulting middleware model is called ROAFTS II. This dissertation work presents: (a) an improved RT fault detection scheme, SNS (Supervisor-based Network Surveillance) II, which is capable of detecting network partition events and network merges, and locating fault sources; (b) a reliable messaging protocol, called RMP, for fast detection and masking of message losses due to transient faults occurring on the communication paths; (c) a template based implementation technique enabling TMO application developers to easily implement primary-shadow TMO replicas; (d) a new mechanism for fault report handling and system reconfiguration; and (e) an extension software layer for managing home network devices. Each of these contributions facilitates the analysis of fault detection latency bounds.;An experimental evaluation has been conducted. Considering the results, the middleware model presented represents an important step towards establishing a solid foundation for cost effective development of FT RT DC applications.
Keywords/Search Tags:Model, RT DC, Fault detection, FT RT, Applications, Mechanisms, ROAFTS, Recovery
Related items