Font Size: a A A

Efficient System-Level Fault Diagnosis Of Multicomputer Systems

Posted on:2010-04-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:H YangFull Text:PDF
GTID:1118360275974193Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As popular high performance computation platform, multicomputer systems offer extremely fast calculation speed and, hence, are especially useful in computation-intensive applications. The growing size of multicomputers, however, leads to ever-increasing likelihood that faulty units exist in such systems. Therefore, it becomes an urgent task to maintain high system reliability and availability. Specifically, it is essential to detect and locate the failing components within a system, followed by replacing them with spare ones.The system-level diagnosis provides an effective approach to fault identification of parallel computers. In this context, a set of tests are conducted and, then, the test outcomes are interpreted to locate the faulty components. With no need for dedicated device, this approach can achieve the diagnosis goal automatically, efficiently and economically. The central task of system-level diagnosis is to determine those faulty processors within the system under examination.This thesis addresses the system-level diagnosis of multicomputers. The main contributions of this thesis are listed below.1. Compared with the popular precise and pessimistic diagnosis strategies, the t/k diagnosis strategy (k≥2) can significantly improve the self-diagnosing capability of a system at the expense of several fault-free processors being mistakenly diagnosed as faulty. To our knowledge, no efficient t/k diagnosis algorithm (k≥2) has been reported. The generalized cube networks (GCN) are a large class of regular topologies, which assume the hypercube and some of its variants. It is known that an n-dimensional GCN is t/k-diagnosable for k≤n + 1, t = (k + 1)n– (k + 1)·(k + 2)/2 + 1. Under the PMC model, this thesis proposes a (4n– 9)/3 diagnosis algorithm suited for all n-dimensional GCNs. This algorithm can be performed in O(Nlog2N) time, where N = 2n is the total number of processors in the network2. Inspired by the biological immune theory, the artificial immune systems (AISs) have been suggested to tackle complex problems. Under the PMC model, this thesis attempts to carry out system-level diagnosis by using AIS. For that purpose, an affinity function is defined to measure the resemblance between the input syndrome (antigen) and the one generated by a potential fault set (antibody). On this basis, we present an artificial-immune-system based fault identification algorithm. Both theoretical analysis and experimental results demonstrate the effectiveness of this method.3. Inspired by the foraging behavior of real social insects living in colonies, such as ants, termites, wasps and bees, swarm intelligence (SI) has been proposed as one of the intelligent computing or optimizing methods. Under three different models, this thesis is intended to carry out system-level diagnosis with the aid of SI. First, we introduce the concept of"pheromone", a chemical substance ants deposit when seeking food, to evaluate the similarity between the given syndrome and the one produced by the currently guessed fault set. Then a swarm-intelligence based diagnosis algorithm is designed for general diagnosable systems. The simulation results justify the utility of the proposed algorithm.4. The MM* model is a practically useful comparison model to the system-level fault diagnosis. Locally twisted cube and M?bius cube are two important variants of the hypercube. Under the MM* model, we present a diagnosis algorithm for each of these two topologies. The time complexities of these algorithms are both O(N log 22N), which is remarkably superior to the Sengupta-Dahbura's O(N5) diagnosis algorithm.In conclusion, the thesis has studied the system-level diagnosis problem of multicomputers, and has presented several efficient diagnosis algorithms. The correcteness and effectiveness of these algorithms are theoretically proved or experimentally shown.
Keywords/Search Tags:Multicomputer, System-Level Fault Diagnosis Algorithm, PMC Model, Comparison Model, Computational Intelligence
PDF Full Text Request
Related items