Font Size: a A A

Research On High Availability Of Distributed Mission-Critical System

Posted on:2007-10-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhuFull Text:PDF
GTID:2178360185466949Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
"If a problem has no solution, it may not be a problem, but a fact, not to be solved, but to be coped with over time" said Shimon Peres, 1994 Nobel Peace Prize laureate. This quote has become the mantra of the ROC project, in which the paper considers crashes, hangs, operator errors, and hardware malfunctions to be facts, and the way we cope with these inevitable failures is through fast recovery.Microreboot is a new technology for fast and cheap recovery applied for large-scale distributed application system. This paper analyses its principles and deploy strategies in detail, summarizes the properties of the crash-only software design. Then the paper analyses the problems what we could encounter when microreboot is applied, especially describes the evolution of the r-map by an instance.Based on study of the technology with microreboot, the paper designs a model for componentized distributed application system, named SHMM (self-healing model based on microreboot. The model includes three aspects: macroanalysis for fault detection and localization, microrebooting for rapid recovery, and external management of recovery actions. It shows that application-generic recovery from transient/intermittent failures can be performed autonomously and fast by application systems, with no human assistance. At last, the paper puts forwards "component-level rejuvenation" based on microreboot, designs it from time-based and measure-based methods in terms of traditional rejuvenation strategy. The work of this paper can direct the implement of high availability for distributed application system.
Keywords/Search Tags:High Availability, Microreboot, Crash-only Software, Self-Healing Model, Component-level Rejuvenation
PDF Full Text Request
Related items