Research Of Rollback Recovery Based On Dependency Tracking And Message Counting

Posted on:2014-06-26

Degree:Master

Type:Thesis

Country:China

Candidate:G B Yuan

Full Text:PDF

GTID:2268330425484196

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

At present, a large number of scientific research and engineering applications runon distributed computing systems. But with the expansion of the system scale and theincrease in the number of system nodes, the probability of failure increases when thesystem runs. To still be able to ensure the correctness of the results after the failure orabnormal, or to meet the needs of the applications, the system must have the ability tofault-tolerant. Rollback recovery fault-tolerant technology based on time redundancyfor fault tolerance without node redundancy is the mainstream technology to achievereliability of high-performance distributed computing. While the rollback recoverytechnology will bring a lot of overhead for protecting system reliability and theoverhead greatly limits the application and development of it. Research to reducerollback recovery protocol overhead and improve system execution efficiency hasimportant significance. The main research in this thesis includes the following twoaspects:First, for the problem that the message logging overhead caused bysynchronization constraints is great in the traditional message logging protocol, alight-weight message logging protocol based on dependency tracking is proposed inthis thesis. The protocol utilizes messaging features in runtime and appliesinformation piggyback strategy to relieve the synchronization constraints in themessage logging. In this protocol, message data stored in the sender, not impose anyconstraints message and submitting information passed with the message save in therelying party of dependencies extension, this storing way did not introduce anyconstraints. Message submitting information track by saving party try to avoid theunnecessary transmission and reduce the amount of piggyback message, this way havethe lightweight characteristics. Experimental results show that, compared with theEgida protocol, the logging overhead and the checkpointing overhead are reduced byabout10%.Second, for the problem that there is usually larger blocking or coordinatedoverhead in existing coordinated checkpoint protocols, a non-blocking coordinatedcheckpointing protocol based on message counting is proposed in this thesis. Theprotocol divides runtime state of process into three kinds, utilizes the feature that theprobability of checkpointing is much high than probability of failure occurrence during distributed parallel programs run, using the information piggyback strategyand non-blocking execution mechanism, transfers part of coordinated overhead in thecheckpointing phase to failure recovery phase. In addition, the protocol identifies thecommunication situation of process in a checkpoint interval to avoid the process to setup unnecessary checkpoints, thereby reducing the overall overhead of checkpointingphase. Experimental results show that, compared to the two-stage checkpoint protocol,the coordinated checkpointing overhead is reduced by20-40%; compared to thedistributed snapshot protocol, the coordinated checkpointing overhead is reduced byabout20%.

Keywords/Search Tags:

Rollback Recovery, Overhead, Coordinated Checkpoint, MessageLogging, Dependency Tracking

PDF Full Text Request

Related items

1	The Research And Implementation Of Checkpoint Technology Based On WinNT
2	The Research On Low-overhead Rollback Recovery Fault-Tolerance Technology
3	Research On Incremental Checkpointing And Rollback Recovery
4	Fault-Tolerant Of MPI Programs Based On Rollback Recovery
5	Parallel Computing Environment Based On The Volume Of The Checkpoint Recovery Technology Research
6	Study On Backward Recovery Of Fault Tolerant Technology In Distributed Systems
7	Research On Low Overhead Non-blocking Checkpointing Scheme For Mobile Computing System
8	Research On Key Technology Of Coordinated Rollback-recovery Protocols In Cloud Platform
9	Research On Key Techniques Of Rollback Recovery In Mobile Computing Environment
10	Research On Rollback Recovery Fault-Tolerance Technology In High Availability Cluster