Font Size: a A A

Study On Backward Recovery Of Fault Tolerant Technology In Distributed Systems

Posted on:2012-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:C Y LiuFull Text:PDF
GTID:2218330338463716Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
In recent years, high reliability and high availability applications demand of distributed computing systems has been steadily growing, such as the global personal and military communication systems, air traffic control systems, network management platform, and financial system. With the expansion of the scope of the application and the increase of the number of nodes in distributed computing system, the issues of network heterogeneous are increasingly prominent; the software systems are increasingly large and complex in the design of distributed computing system, failure probability of system are getting more and more high; If there is no fault tolerance, the calculation is interrupted, the previous calculation done will be lost, calculations need to begin from scratch. This catastrophic phenomenon is intolerable. Without fault tolerance in the distributed computing system, computing task may take a long time to complete, not even complete. Therefore, the research of fault-tolerant technology in distributed application has great theoretical significance and practical value. Backward recovery technology is a hot research field in current research of fault-tolerant technology, including the following research directions:checkpointing algorithm( including the improvement of the efficiency of the taking the checkpoints, the reduce of the overhead of the checkpointing, and effectively control of the volume of recovery, etc.); the fault-tolerant system model of fault-tolerant rollback recovery; performance evaluation and optimization strategies of the algorithm; fault characteristics and detection of the distributed computing system; and capture and recovery of the process status.The proposed project comes from the Natural Science Foundation of Shandong Province, "The research and implementation of fault-tolerant technology based backward recovery in heterogeneous distributed systems". This paper describes the current research situations of fault-tolerant in distributed systems, common faults in distributed system and relevant concepts and definitions in fault-tolerant technology; the various problems to be resolved in distributed fault-tolerant system, such as: orphan message, in-transmit message, checkpoint overhead, and the problem of domino effect; introducing the conditions and theorems of how to eliminate the non-global consistent checkpoint state; analysising the principles, performance and advantages and disadvantages of various checkpointing technology and messages logging in distributed fault-tolerant system; analysising the bottlenecks of impacting the performance of checkpointing algorithm, and studying the principles of the taking of the distributed fault-tolerant checkpointing algorithm, such as reducing the number of checkpoints, improving the efficiency of taking checkpoints, and reducing the number of the control messages. The main work of this paper are the following:First, this paper analyzes the limitations of Extended Finite State Machine and Fault Tolerant Mechanism. The model was improved.Second, this paper proposes an efficient non-blocking coordinated checkpoint algorithm. On the algorithm, more processes can concurrently take consistent global checkpoints. The algorithm reduces the overhead by saving the state asynchronously and taking checkpoint when the amount of state information to be saved is small. The algorithm greatly lowers the overhead of checkpoint and improves system's performance.Third, an algorithm improved upon ASNB is proposed which has better adapt to the different sensitivity process on the faulty, so different processes can be used to set different intervals checkpoint.
Keywords/Search Tags:distributed systems, fault-tolerance, checkpoint overhead, non-blocking, rollback recovery
PDF Full Text Request
Related items