Font Size: a A A

Index-based Quasi-synchronous Checkpointing Protocols In Distributed Systems

Posted on:2006-10-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y S LuoFull Text:PDF
GTID:2168360155462102Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Distributed systems today are widely applicable, including client-server systems, transaction processing, World Wide Web, and scientific computing, among many others. Fault tolerance of these systems is mainly focused and many techniques have been developed to improve reliability and high availability of distributed systems. Rollback recovery is one of them.Fault tolerance of rollback recovery is achieved by periodically using stable storage to save the processes' states during failure-free execution. Upon a failure, a failed process restarts from one of its saved states, thereby reducing the amount of lost computation. Each of the saved states is called a checkpoint. The optimizing schemes such as reducing the cost of checkpointing, finding an optimal interval of checkpoints and so on, which are presented in uniprocessors checkpointing can be adopted in the distributed checkpointing. Furthermore, distributed systems complicate rollback recovery because messages induce inter-process dependencies during failure-free operation. It is desirable to reduce the overhead of checkpointing and, at the same time, keep the domino-effect freedom and ensure the consistent global checkpoints.Distributed checkpointing can be broadly classified into three categories, that is, asynchronous, synchronous, and quasi-synchronous. In this paper, the study of index-based quasi-synchronous checkpointing is introduced. Some important definitions and theorems are reviewed and the prevailing index-based checkpointing protocols are presented at first. Then two improved schemes based on the previous protocols are proposed and thus form a new protocol. The simulation results are given to show that the new protocol can achieve performance improvements, compared to the traditional ones. After that, the comparison approach about the index-based checkpointing protocols is presented with some important conclusions, whereas a different opinion of us is introduced at the same time.
Keywords/Search Tags:Software fault tolerance, distributed computation, checkpoint, domino effect, consistent global checkpoint.
PDF Full Text Request
Related items