Font Size: a A A

Proactive management of software systems: Analysis and implementation

Posted on:2003-04-13Degree:Ph.DType:Thesis
University:Duke UniversityCandidate:Vaidyanathan, KalyanaramanFull Text:PDF
GTID:2468390011479331Subject:Engineering
Abstract/Summary:
Recently, the phenomenon of software aging, one in which the state of the software system degrades with time, has been reported. The primary causes of this degradation are the exhaustion of operating system and middleware system resources, fragmentation of these resources, data corruption and numerical error accumulation. Eventually, software aging may lead to performance degradation, security compromise or crash/hang failure. To counteract software aging, a proactive approach to fault management called software rejuvenation has been proposed. It essentially involves occasionally terminating an application or a system, cleaning its internal state and restarting it.; In this thesis, first, we extend the traditional classification of software faults (deterministic and transient) to include faults attributed to software aging, and study the treatment and recovery strategies for each of the fault classes. This will help us understand the nature of software faults and their impact on system availability and performance and aid in choosing the best possible recovery strategy when a fault is triggered.; Next, we discuss methods of evaluating the effectiveness of proactive fault management in operational software systems and determining optimal times to perform rejuvenation. In this regard, we take a two-pronged strategy—measurement-based modeling and analytic modeling.; The measurement based approach deals with detection of software aging and predicting aging related failures by collection and analysis of system data, so that proactive methods can be applied to prevent unplanned outages. For quantifying the effect of aging in these resources, we propose a metric called the Estimated time to exhaustion. The developed measurement-based models are the important steps towards predicting aging-related failures based on actual measurements, intended to help development of policies that automate the proactive handling of potential problems. The aim of the analytic modeling is to determine optimal times to perform rejuvenation by developing and analyzing stochastic models to maximize availability or minimize downtime cost. Using stochastic reward nets (SRNs), we model and analyze different rejuvenation policies for a cluster system. We also model inspection-based preventive maintenance in systems whose degradation level can be determined through some observable parameters. The model is solved using Markov Regenerative Process (MRGP) theory to obtain optimal rejuvenation strategies.; We then describe the design and implementation of a software rejuvenation agent implemented in a major commercial server.; The measurement-based model is then combined with an earlier analytical model to obtain a comprehensive model for software aging and rejuvenation.; Finally, we summarize the contributions of the thesis, classifying the approaches and current methods of rejuvenation.
Keywords/Search Tags:Software, System, Rejuvenation, Proactive, Management
Related items