Font Size: a A A

A novel low-overhead recovery approach for distributed systems

Posted on:2010-01-14Degree:M.SType:Thesis
University:Southern Illinois University at CarbondaleCandidate:Kosaraju, SundeepthiFull Text:PDF
GTID:2448390002981759Subject:Computer Science
Abstract/Summary:
In this work we have addressed the complex problem of recovery for concurrent failures in a distributed computing environment. We have proposed a new checkpointing and recovery approach that enables each process to restart from its recent checkpoint and therefore guarantees least amount of recomputation to be done after recovery. The proposed new approach deals effectively with orphan and lost messages. We have introduced two new ideas. The value of the common checkpointing interval is such that it requires to log only the messages sent in the recent checkpoints of the processes. The lost messages are always determined a priori by the initiator process in parallel to the normal distributed computation. Thereby, it does not delay the recovery approach in anyway.
Keywords/Search Tags:Recovery, Distributed
Related items