Font Size: a A A

Incorporating File I/O Into Checkpointing Under Clusters Environment

Posted on:2007-06-07Degree:MasterType:Thesis
Country:ChinaCandidate:L WangFull Text:PDF
GTID:2178360185454136Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Checkpointing is an important technique used to achieve fault-tolerance, making long-running applications with little work lost due to a failure. While a running program maymodify user files content, it may result in an incorrect recovery from failures because ofpersistence of file content state. Therefore, incorporating file I/O into consistent checkpointinghas become a critical concern to ensure consistence between user file data state and other stateof a process.This thesis studies the technology of incorporating File I/O into checkpointing, presents afile checkpointing approach based on Shadow Block Copy, and introduces the design andimplement of the file checkpointing system under Dawning clusters. The contribution of thisthesis arrives from the follows:1. Studies the feature of file checkpointing technology, and discusses key issues about thedesign and implement of file checkpointing system based on the analyses and summariesabout existed file checkpointing systems. At the same time, this thesis provides the solution tosolve these problems according to the pattern of file access under Dawning cluster.2. Proposes a system level file-checkpointing system frame. The file-checkpointingsystem includes three modules: monitoring module, checkpointing module and recoverymodule. Monitoring module saves file modified record. File modified record is cleared bycheckpointing module and used for recovery by recovery module when failures occur.3. Presents a new file checkpointing approach which takes advantage of Shadow BlockCopy optimization. This approach only backup modified blocks for files are logicallyfragmented into disk blocks, therefore, the overhead of space and runtime can be largelyreduced.4. Designs and implements the file checkpointing system which is adopted in Dawningsuper computer supporting fault-tolerance for long-running I/O-intensive applications. Itsfunctional correctness and performance are also analyzed. This approach is developed in OSkernel and transparent to users and programmers. Coordinate with process checkpointing, itcan achieve file content recovery and ensure consistence between user file data state and otherstate of a process. It is reliable enough to tolerate any failures that could potentially occur infile checkpointing procedure. With experiment results, it can support checkpointing andrecovery of MPI applications running in LAM/MPI programming environments.
Keywords/Search Tags:Checkpointing, Fault-Tolerance, Recovery, File-checkpointing, Shadow-Block
PDF Full Text Request
Related items