Font Size: a A A

The Design And Research Of Process Level Fault-tolerance Based On Checkpoint

Posted on:2010-09-13Degree:MasterType:Thesis
Country:ChinaCandidate:Z XieFull Text:PDF
GTID:2178360272979340Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Nowadays computer become more and more powerful, more people pay attention to the area of software implemented fault tolerant, it can provide low price fault tolerant computing and the implementation is more flexible, so it is very useful in many areas. With the Linux used widely, there are more and more applications based on it. People attach much importance to fault tolerance of the applications. So many research organizations have improved the fault tolerance of the application on it, by modifying its kernel.A checkpoint facility enables the intermediate state of a running process to be saved to stable storage. When a failure occurs, users can resume execution of the process from the checkpoint file. This prevents the loss of data generated by long running processes due to program or system failures.Firstly, the checkpoint technology and process management of Linux is generally reviewed in this thesis, then introduces the key technologies and principles of checkpoint, analyzes and points out the difficulties and the problems need to resolve. Following, this paper makes a detailed analysis of the design and implement of the checkpoint system based on process fault-tolerance. It is composed of three modules: fault treatment, process monitoring, checkpointing and rollback. Fault treatment is composed of fault detection and fault analysis, deciding as to whether the process of recovery and checkpointing. The process monitoring implement the protection of key processes in real-time, whether or not the normal operation. Checkpoint and rollback is mainly divided into the time interval of checkpoint and checkpointing, applies checkpointing to achieve the rollback backup and restarting of the process by reserving and restarting the process context, the system context and something relevant with the process running.Finally, this thesis adopts improved AFOM model to predict the static optimum checkpoint interval for restart, which helps to select appropriate interval of checkpoint and reduces the overhead.
Keywords/Search Tags:fault-tolerance, process monitoring, checkpointing and rollback, checkpoint interval
PDF Full Text Request
Related items