Font Size: a A A

Research On The Method Of Microreboot Oriented Parallel Computing Environment

Posted on:2016-07-10Degree:MasterType:Thesis
Country:ChinaCandidate:G LiFull Text:PDF
GTID:2348330542976233Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The constancy of critical mission system is facing serious challenges due to the expansion of computation scales and complication of application environment.Any process problem will lead to the entire program failures.Restarting the all program will bring huge time overhead.Especially,lacking of effective fault tolerant means in the environment of massively parallel computing,the happening of program errors greatly reduces the efficiency of executed missions of system.Checkpoint,which is used in microreboot,is a key technology to solve the problem of huge time overhead in completely restart.Microreboot,which is based on checkpoint,saves program states in checkpoint files.Microreboot improves fault tolerant capacity of computer system through rollback recovery based on checkpoint files when the fault occurs.However,the existing methods of microreboot can not satisfy the technical requirements of current environment due to the relationship between the parallel processes,.In order to realize the effective recovery of program in the parallel computing environment,the paper proposes a new method of microreboot for parallel computing environment.By adding code pre-procedure to checkpoint,this method realizes consistency preservation of processes states without data pollution.By choosing portable variables,this method effectively reduces file size of checkpoint,which can save storage space and reduce read-write time with outside storage.Besides,the paper proposes a program fault judgment mechanism based on process status and a program recovery method based on the effective state line.Through the combination of those two methods,the paper implements the automation of program fault judgment and program state recovery,and improves the reliability and execution efficiency of the system simultaneously.
Keywords/Search Tags:parallel computing, availability, microreboot, checkpoint, state recovery
PDF Full Text Request
Related items