Font Size: a A A

AQuA: A framework for providing adaptive fault tolerance to distributed applications

Posted on:2002-04-07Degree:Ph.DType:Thesis
University:University of Illinois at Urbana-ChampaignCandidate:Ren, YansongFull Text:PDF
GTID:2468390011997230Subject:Engineering
Abstract/Summary:
Dependable distributed systems are difficult to build. This is particularly true if they have dependability requirements that change during the execution of an application and are built with commercial off-the-shelf hardware. In that case, fault tolerance can be achieved using middleware, and mechanisms must be provided to communicate the dependability requirements of a distributed application to the system and to adapt the system's configuration to achieve the desired dependability.; In this thesis, we developed and implemented the AQuA architecture, which allows a distributed CORBA application to request a desired level of dependability during the application's runtime. AQuA provides fault-tolerance mechanisms to ensure that a CORBA client can obtain reliable services, even if the CORBA server object that provides the desired services may suffer from crash failures and value faults.; AQuA includes a replicated dependability manager that provides dependability management by configuring the system in response to applications' requests and changes in system resources due to faults. It uses Maestro/Ensemble to provide group communication services, and it provides applications with different types of groups so that they can achieve better scalability as well as dependability. The AQuA gateway is designed to intercept standard CORBA IIOP messages to allow any standard CORBA application to use AQuA. The gateway also includes a set of handlers that implement replication schemes to forward messages reliably to the remote replicated objects.; The AQuA architecture includes several different types of active and passive replication schemes to provide fault tolerance. All of the replication schemes ensure strong data consistency among replicas. Each replication scheme includes a communication scheme that provides reliable message transmission, totally ordered messages across replicas, and automatic recovery from faults. Three types of active replication schemes are developed for AQuA including the active replication with pass-first scheme, the active replication with leader-only scheme, and the active replication with majority voting scheme. The first two schemes are able to tolerate crash failures. The third scheme is able to tolerate both crash failures and value faults, and also provides a technique to ensure that voting can be done correctly even if both the group membership and the majority size change dynamically.; Performance measurements were taken for both the active and passive replication schemes, with different numbers of replicas and various message lengths. In addition, fault detection and recovery times with different replication schemes under various failure situations were studied.
Keywords/Search Tags:Replication schemes, Fault, Distributed, Aqua, Dependability, Application, CORBA, Different
Related items