Font Size: a A A

The Research On Low-overhead Rollback Recovery Fault-Tolerance Technology

Posted on:2006-05-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:J M YangFull Text:PDF
GTID:1118360152470084Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Nowadays Internet, Desktop PC and Windows operating system are almost pervasive in every office, family and laboratory, with numerious application running on such a platform. For such a ubiquitous computing, fault-toleranc becomes highly desirable. Furthermore, many of them require that fault- tolerance should be provided with low overhead in terms of hardware/software resources. Rollback recovery seems meeting this demand quite well, providing an attractive low-overhead solution to building fault-tolerance application. However, such applications also bring some challenges to rollback recovery, in which the inplementation of rollback recovery and decrease of the overhead call for further study. The dissertation aims at reasonable implementation, approaches to decreasing low overhead, evaluation to performance and new rollback recovery protocols with favorite properities, providing support and basis for the use of rollback recovery.We address the problems in user-level implementation of checkpointing for multithreaded process, introducing the implementation strategy based on virtual object. The strategy not only simplifies some inherent issues in checkpointing, but also overcome some limitations in the existing checkpoint systems. In our scheme based on virtual object, the necessity of atomic operation of wrapped member fuctions of virtual object is manifestly introduced. We also present an approach to atomic operation of wrapped member fuctions.We also address the problems in implementation of rollback recovery in distributed systems, introducing a multithreading unified rollback recovery framework for various rollback recovery protocols. In spite of the diversity of protocols, afer closely examining them, we extract a set of common basic components involved in various protocols. We set up buffers to enable simultaneous execution of sending message, receiving message, computing, logging message and checkpointing in a process. Multithreading is used to achieve simultaneous execution of 4 workloads in a process.In coordinated checkpointing, checkpoint time is an important performance metrics. Short checkpoint time can result in low overall overhead ratio and fast output commit. After analyzing the factors influcencing the checkpint time and studying the approachs to reduce checkpoint time, we propose a new coordinate checkpointing protocol base on multithreading mode, with shorter checkpoint time compared to the existing protocols.In distributed systems, it is possible that various hosts hold different fault ratio, rollback overhead and the degree of permission to rollback. It is desirable that rollback recovery cannot only meet the specific demands but also hold the feature of low overhead. We develop a rollback recovery scheme based on partitioned message logging, establish its performance model, and evaluate its average overhead ratio. Our scheme being a configurable general rollback recovery approach, its two end points correspond to conventional pessimistic message logging and coordinated checkpointing, respectively. Theoretical results show that protocol overhead ratio can be reduced by right configuration parameters fitting into system characteristics. Our scheme can optimize protocol performance. Considering the scalability of rollback recovery, we introduce the conception of bounding the scope of rollback. In order to achieve local recovery and low overhead, we introduce a three-layer model of large distributed system in WAN environment, and present a protocol of message dependency tracking based on proxy.According to checkpoint implementation strategy based on virtual object and multithreading unified rollback recoveryframework, we develop a testded for rollback recovery on Windows operating system. We run performance benchmarks on our testbed. It is observed that multithreading is an effective approach to decrease significantly the overhead of checkpointing and logging message. We think that, after achieveing low overheads by multithreading, configurable coarse-grained pessimistic message logging protocol (i...
Keywords/Search Tags:Sofeware fault tolerance, Rollback Recovery, Checkpoint, Message logging, Overhead
PDF Full Text Request
Related items