Font Size: a A A

System support for service availability, remote healing and fault tolerance using lazy state propagation

Posted on:2005-01-08Degree:Ph.DType:Thesis
University:Rutgers The State University of New Jersey - New BrunswickCandidate:Sultan, FlorinFull Text:PDF
GTID:2458390008499958Subject:Computer Science
Abstract/Summary:
Our thesis is that lazy state propagation can be successfully used to implement efficient support for service availability, remote healing and fault tolerance.; The end-to-end availability of an Internet service is currently constrained by the static client-server binding imposed by the TCP/IP protocol. To overcome this problem, we propose lazy migration of live client service sessions between equivalent servers. We have designed and implemented Service Continuations, an OS mechanism for session state migration between multi-process servers, along with Migratory TCP, a connection migration protocol that enables lazy session migration, and present experimental results with real Internet servers that validate the approach.; Failure or damage to the state of the OS can lead to loss of critical application and OS state residing in system memory. As a solution to this problem, we propose remote healing through lazy recovery/repair actions on the in-memory software state of a computer system. To enable remote healing, we have designed and implemented Backdoors, a novel system architecture based on remote memory communication that allows access to resources of a machine even after an OS failure renders it unavailable. We present experimental results showing the Backdoors achieves efficient monitoring and fast recovery and repair.; Distributed shared memory (DSM) systems used to run parallel applications on large commodity clusters are sensitive to individual node failures that compromise the whole computation. We have designed and implemented an efficient fault-tolerant DSM system for which we have developed two lazy algorithms for garbage collection of recovery state. We demonstrate through experiments with benchmark applications that our recovery support is light-weight and that lazy garbage collection effectively limits the amount of recovery state retained in the system.
Keywords/Search Tags:State, Lazy, Remote healing, Support, System, Service, Availability, Recovery
Related items