Font Size: a A A

Research On Failure-Recovery-Oriented Undo Method

Posted on:2012-08-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q ZhengFull Text:PDF
GTID:2218330368982424Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As system complexity, heterogeneous, and dynamic are increasing day by day, and external attack means are innovating, the management and maintenance of the key task system have become increasingly difficult, operating error occurring frequently, as well. System is threatened by failure problem at any minute, like mission interrupt, software malfunction, and even collapse crash etc. Therefore, distributed mission-critical system failure restoration problem has become a serious challenge. A large number of statistics show that, most system failure of occurrence and human mistakes are concerned. Facing the system failure problems caused by the random sudden human error operating, it won't help by simply improving the system hardware and software performance, which may even make the system become more unreliable.Aiming at this issue, undo recovery mechanism is proposed in the basis of the early checkpoint technology, in order to make the system has regret and correction ability to user operations. Because of the flexible control functions of user operations and obvious advantages in recovery efficiency, undo recovery technology has received widely attention by researcher now, and has become an effective means to solve system failure problem and guarantee the security of system data.System failure's reason, type, prevention and restore aspects are analyzed in detail in the paper. Accounting for the fact that operational errors is becoming a larger ratio in inducements of system failure, an hierarchical undo method based on operation-increment is designed and realized in this paper, by referencing to the traditional undo/redo mechanism and existing 3R ideology. At first, the related concepts of operation-increment are defined with formal description languages. On this basis, the paper builds a hierarchical undo model for distributed mission-critical systems, and proposes construction method of operation increment, as well as repairing method of misoperation. After that, a classification compensation strategy is used to solve the inconsistency problem that may exist during an undo recovery process. The results show that the method proposed decreases recovery granularity and cost, increases recovery speed, and has a higher recovery efficiency comparing with the traditional rollback recovery methods.
Keywords/Search Tags:failure recovery, undo methods, operation increment, operation repair, inconsistency
PDF Full Text Request
Related items