Font Size: a A A

Probabilistic inference for diagnosing service failures in communication systems

Posted on:2004-04-03Degree:Ph.DType:Thesis
University:University of DelawareCandidate:Steinder, MalgorzataFull Text:PDF
GTID:2468390011958446Subject:Computer Science
Abstract/Summary:
Fault localization is a process of deducing the exact source of a failure (a root cause) from a set of observed failure indications. Today's communication systems require techniques capable of: multi-layer integrated diagnosis, diagnosis of performance problems, dealing with uncertainty within the knowledge of the system structure and state, ability to isolate multiple simultaneous faults, event-driven and incremental diagnosis, high accuracy, and low complexity.; This dissertation addresses these challenges in designing a fault propagation model and algorithms for probabilistic fault localization. The proposed fault propagation model is a probabilistic multi-layer dependency-graph that incorporates both availability and performance problems, and allows arbitrary relationships among system components to be represented. For the purpose of fault diagnosis, the dissertation adopts two Bayesian inference algorithms that calculate belief-updating and most-probable-explanation queries in singly-connected belief networks to perform fault localization in belief networks with loops. The dissertation also proposes a novel fault localization algorithm based on incremental hypothesis updating to perform diagnosis with bipartite fault propagation models. These fault localization techniques are extended to incorporate reasoning with positive symptoms and be resilient to noise in the alarm data. The algorithms are evaluated using the problem of end-to-end service failure diagnosis as a case study. The dissertation defines this problem as the task of isolating host-to-host service failures responsible for failures of end-to-end services.; Although the algorithms based on iterative belief updating and incremental hypothesis updating introduced in this dissertation are efficient in the diagnosis of multi-fault end-to-end-failure scenarios in networks composed of tens of nodes, they do not scale well to networks composed of hundreds or thousands of nodes. To address this scalability problem, the dissertation introduces a multi-domain fault localization approach to end-to-end service failure diagnosis in hierarchically routed networks. The multi-domain approach divides the computational effort and system knowledge involved in end-to-end service-failure diagnosis among multiple hierarchically organized managers. The dissertation first proposes an algorithmic framework for the design of probabilistic techniques of multi-domain fault localization. Then, it introduces two specific techniques that expand on the centralized algorithms introduced in the dissertation: iterative belief updating and incremental hypothesis updating.
Keywords/Search Tags:Fault localization, Failure, Incremental hypothesis updating, Dissertation, Probabilistic, Service, Algorithms, Diagnosis
Related items