Font Size: a A A

Self-healing Based Model Construction And Policy Research For Computing Systems

Posted on:2007-10-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:J W WangFull Text:PDF
GTID:1118360185491851Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Software aging [1], one in which error conditions actually accrue with time and/or load, has been observed, and it can eventually make the system unavailable. In systems, with high reliability/availability requirements, software aging can cause outages resulting in high costs. Studies have reported transient nature of software failures [2,3]. For which design diversity is not very helpful. Transient failures typically occur because of design faults in software which result in unacceptable erroneous states in the OS environment of the process. Hence, environment diversity, a generalization of system restart [4], has been proposed as a cheap yet effective technique for software fault tolerance [5, 6]. The basic idea here is to modify the operating environment of the running process.To counteract software aging, a proactive technique called self-healing software theory, has been proposed. It involves stopping the running software occasionally," cleaning" its internal state and restarting it. Garbage collection, flushing operating system kernel tables, reinitializing internal data structures are some examples of what cleaning the internal state of software might involve.Traditional fault tolerance techniques are reactive in nature and typically, environment diversity has been done so far on a corrective basis, On the other hand, proactive fault management, as the name implies, takes suitable corrective action to prevent a failure before the system experiences a fault.In this thesis, an analytical multi-level model in semi-Markov process style is presented to consider self-healing theory, which includes system level, service level and process level. Analyzing local and global kernels of the semi-Markovain model, we get the maximized steady-state availabilities and optimal self-healing schedules under different level scenarios. Aided by numerical examples we illustrate that the fine-grain strategy can lower the cost for self-healing and improve the system availability; and system parameters are the key factors to decide which level's self-healing policy should be executed.Then statistical algorithms are developed to estimate the optimal software self-healing schedules, provided that the statistical complete sample data of failure times is given, and the optimal software self-healing schedules which minimize the expected self-healing cost in per unit time or maximize the system availability are derived under Statistical TTT...
Keywords/Search Tags:software failure, software tolerance, proactive tolerance, self-healing, system availability, system robustness
PDF Full Text Request
Related items