Font Size: a A A

The Research And Implementation Of Object-Oriented Fault-Tolerant Middleware

Posted on:2003-02-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:M H ZhouFull Text:PDF
GTID:1118360092498837Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the ever increasing amount of distributed computing systems applied in wide range of critical domains, the requirement of high reliability and high availability of distributed computing systems tend to be more and more urgent. Because fault-tolerant technology is an effective means to improve the reliability and availability, consequently, how to enhance the fault-tolerant ability of distributed systems comes to be the focus of research. However, distributed fault-tolerant applications are hard to be developed, since the programmers must handle not only the complex business logic but also the confused fault tolerance logic. The latter one is often tedious and error-prone. Therefore, people concern more and more on how to form an unified platform supporting the development of distributed fault-tolerant applications.As we all know, the development of distributed applications greatly benefit from the object-oriented distributed middleware for its convenient development environment and transparent communication mechanisms. So the fault tolerance supporting platform based on the middleware will greatly simplify the development of the platform itself and moreover do great help to the development, run and maintenance of the complex fault-tolerant distributed applications. The fault tolerance supporting platform based on the middleware is so-called the fault tolerant middleware.This thesis aims at making four major contributions:Firstly, the thesis analyses the research efforts recently devoted to build up a fault-tolerant CORBA system, which can be categorized into three approaches: integration approach, service approach and interception approach. To address the flaw of these approaches, a novel computing model of the fault tolerance CORBA and its management framework are proposed, with the name FTOUM and UMRO separately. By combining service and interception approach, the new model will implement the fault tolerance management efficiently, and provide well interoperability and well transparency for clients.Secondly, based on the FTOUM computing model, the thesis presents a new fault tolerance algorithm, ORAML, using both checkpointing and module replication, which employs flexible configuration management mechanisms to implement dynamic replication, imports the fault tolerance policies on the client side to make the clients take part in the fault tolerance process actively, and enables the fault tolerance process transparent to the clients completely while separating the replication protocol from the communication protocol .Thirdly, the thesis focuses on three issues of the fault tolerance managementwhich must be addressed by the fault tolerant middleware: object replication, fault detection and notification, state logging and recovery. Accordingly, three mechanisms of replication management, fault management, logging and recovery management are proposed. To resolve the issues of object group's life cycle management and fault tolerance properties management, an object factory model is designed to load the objects dynamically, two kinds of member creation patterns are introduced, a set of fault tolerance properties are defined according to the common characteristics of the fault-tolerant system, and moreover a hierarchy model is proposed to enable the dynamic and flexible configuration. The key problems of fault management are fault detection and notification. The notification model is imported to deliver the fault events efficiently. Facing the entities on different level, like object, process and mainframe, a fault detection model of tree structure is adopted to enhance the scalability. And finally a ring-check algorithm is proposed to resolve the single-failure problem. Through combining service and interception approach, the thesis presents the logging and recovery mechanisms for object state recovery. It can effectively implement the logging and recovery for different replication styles, and guarantee the consistency of state transfer between replicas.The fina...
Keywords/Search Tags:Fault-tolerant middleware, Distributed object, Object replication, Fault detection and notification, Logging and recovery, CORBA
PDF Full Text Request
Related items