Font Size: a A A

Application And Development On Condor's Checkpoint Mechanism

Posted on:2006-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:X D WeiFull Text:PDF
GTID:2168360155953122Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Checkpoint Mechanism, which basical function is save/restart state of a process, can be called by system at any time. Checkpointing is taking a snapshot of the current state of a program in such a way that the program can be restarted from that state at a later time. It will save state of a process to a file called Checkpoint File at some time and the point at which the Checkpointing is made is called Checkpoint Time. The Checkpoint File will be submitted to a restart procedure that will resume execution of the checkpointed process at future time. Condor, developed by the Condor Team at the University of Wisconsin-Madison, is a specialized batch system for managing environment named a High-Throughput Computing (HTC) environment that can deliver large amounts of computational power over a long period of time. A unique feature of the Condor implementation is that they use Checkpoint Mechanism to migrate process successfully. This paper is with the background of project "Dynamic Selecting Resources and Rescheduling Tasks on Computing Grids"sponsored by Jilin Distinguished Young Scholars Fund. In the paper, we realize an independent portable checkpoint facility named 'Checkpoint Package', which is based on the analysis of Condor. We will focus on the problem about the compatibility of Checkpoint Package in the future project. Based on the discuss about the Condor's kernel, we illustrate the architecture and functions of Condor and the method by which the Condor realizes remote system calls. In the chapter 3, we introduce the ideas of the realization of Checkpoint Mechanism in Condor and realize an independent portable checkpoint facility 'Checkpoint Package'according to these ideas. Checkpoint Package, with own install specification and user manual, serves for Linux users. Checkpoint Mechanism can be devided to two parts: â‘ Create Checkpoint File on the checkpoint time. â‘¡Restart process. Checkpoint File is of great importance in Checkpoint Mechanism. It saves stack segment, data segment, open file info, code segment, symbol table, debug info and all register. On restart, the process reads this state from the file, restoring the stack, shared library and data segments, file state, signal handlers, and pending signals. The checkpoint signal handler then returns to user code, which continues from where it left off when the checkpoint signal arrived. Our Checkpoint Package is given in the form of library. After the link of user program and this library, user can create Checkpoint file at any time. When user hope to restart the program, he only need to use the command by the Checkpoint Package to run the Checkpoint file of this program. However, the library of Checkpoint Package provides several C entry points that allow for a program to control its own checkpointing behavior if needed. Besides the checkpoint library, there are the install specification, the user manual and test programs in Checkpoint Package. The purpose of test programs is to prove the validity and practicality of this Checkpoint Package and...
Keywords/Search Tags:Application
PDF Full Text Request
Related items