Font Size: a A A

Fault Management For Distributed Internet Services

Posted on:2010-09-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:L W ChuFull Text:PDF
GTID:1118360278965444Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Over the last few years, various distributed service systems appeared in Internet, such as web services, IPTV, VoIP and other value-added services. In order to maintain regular customers and attract new users, it's necessary for SPs (Service Provider) to provide QoS (Quality of Service) for their services. However, the devices, networks and application servers involved may sometimes have anomalies, which will degrade QoS of services or even make services unavailable. To guarantee QoS, SPs desire for an effective service fault management mechanism, which can detect the happening of failures, analyze the causes of them and adopt countermeasures as soon as possible.This dissertation focuses on the fault management architecture and related algorithms for distributed services. Graph theory based fault management, service management model, active probing based fault management, fault diagnosis algorithm analysis and improvement in both static and dynamic service scenarios, and the multi-domain fault management framework for distributed services are examined in this dissertation. The main contributions are as follows:(1) An service fault management approach based on active probing is proposed for uncertain and noisy environment. This approach is composed of two phases: fault detection and fault diagnosis. In first phase, we propose a greedy approximation probe selection algorithm (GAPSA), which selects a minimal set of probes while remaining a high probability of fault detection. In second phase, we propose a fault diagnosis probe selection algorithm (FDPSA), which selects probes to obtain more system information based on the symptoms observed in previous phase. Simulation results prove the validity and efficiency of our approach.(2) Event-driven fault diagnosis algorithms through incremental belief assessment are proposed for large scale service systems. An incremental fault belief assessment method is proposed to analyze symptoms and compute posterior fault probabilities in an event-driven manner. Based on the method, we propose a greedy fault diagnosis algorithm to produce a sub-optimal explanation. To reduce the complexity of fault selection, we transform the fault diagnosis problem of finding MPE into finding most likely assignment of each fault, and propose corresponding algorithm. Simulation results show that our algorithms achieve high fault diagnosis accuracy, and save a great deal of diagnosis time.(3) An Fault diagnosis algorithm for dynamic service scenario is proposed. Dynamic change in service environment will affect fault diagnosis accuracy. In order to reduce the impact, challenges of fault diagnosis in dynamic environment are analyzed. To deal with dynamic fault set caused by fault recovery mechanism, we modify prior fault probability based on fault persistent time statistic; to deal with dynamic model, we build an expected model based on observed symptom times and the original models in current time window. Simulation results show that our fault diagnosis algorithm is efficient in dynamic service environment.(4) Fault diagnosis algorithm for multi-domain service scenario is proposed. In multi-domain environment, symptoms caused by inter-domain fault propagation will affect fault diagnosis algorithm. We propose a distributed dependency model to build the dependencies in service system. Based on the dependency model, a distributed fault diagnosis algorithm is proposed, and we improve the algorithm from three aspects: reduce communication cost, accurate effect evaluation function and spurious symptom probability. Simulation results show that our fault diagnosis algorithm is efficient in multi-domain service environment.
Keywords/Search Tags:distributed Internet service, service fault management, fault diagnosis, active probing, event-driven algorithm, multi-domain service
PDF Full Text Request
Related items