Font Size: a A A

A Study Of Reliability Of The Interconnection Networks In High Performance Computers

Posted on:2015-01-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:L HeFull Text:PDF
GTID:1228330422971389Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
High performance computer is a class of computing system, supporting large-scaleapplications and processing big data. To bring high performance computers into fullplay, a satisfactory QoS (Quality of Service) is indispensible and, consequently, thesystem reliability must be ensured. With the rapidly increasing number of components,ranging from computing nodes to chips, in high performance parallal computers, thesizes of the underlying interconnection networks are becoming increasingly large. As aresult, the possibility of the presence of failing nodes in such networks also rises up. Infact, the reliability of a parallel computer largely depends on the reliability of itsunderlying interconnection network.The main task of this thesis is twofold:(1) taking connectivity and diagnosabilityas our fault tolerance measures, to study the reliability of interconnection networks, and(2) to design fault diagnosis algorithms for typical interconnection networks. The maincontributions are presented below.1. Although optical interconnect can offer extremely high bandwidth andextremely low power consumption, electric interconnect outperforms opticalinterconnect when the transmitting distance is only of millimeter magnitude. To makefull use of both electric and optical interconnects, an opto-electric interconnectionnetwork, which is known as the optical transpose interconnect system or simply OTIS,has been suggested previously. It is well known that an OTIS withn2nodes has aconnectivity of n. In this thesis, we show that (1) when n is even, the connectivity ofOTIS can be enhanced if a small set of new edges is added to the original OTIS, and (2)there is a giant connected subgraph when the number of faulty nodes is bounded by athreshold in OTIS, which can continue to do the task of system.2. That all neighboring nodes of any node in an interconnection network failsimultaneously is a small-probability event, and, hence, can be ignored, leading to theintroduction of conditional connectivity as a new measure of fault tolerance ofinterconnection networks. The hypermesh network withk nnodes has many desirablefeatures and, hence, has received considerable attention. For instance, it was reportedrecently that there exists a neighboring failure-free g-node of any node. This thesisstudies the scale of the neighboring failure-free g-node in hypermesh (i.e., the size ofthe resulting maximal conditional connected subgraphs). 3. Compared with the traditional precise diagnosis, the pessimistic diagnosisgreatly increases the self-diagnostic ability of a system at the cost of possiblemal-diagnosis of a fault-free node. As a promising optical network, the opticalMulti-mesh hypercubes (OMMH) network integrates excellent topological propertiesof hypercubes and meshes. It is well known that, under the PMC model, the precisediagnosability of OMMH is known. In this thesis, we prove that the pessimisticdiagnosability of OMMH is two times as many as the precise diagnosability, we thendesign an efficient pessimistic diagnosis algorithm for OMMH networks.4. The folded hypercubes are a class of regular interconnection topologies,which has some excellent topological properties that hypercubes of the samedimension have not. This thesis proves the pessimistic diagnosability of the foldedhypercube network is two times as many as the precise diagnosability. Furthermore, apessimistic diagnosis algorithm with linear time complexity is presented.To sum up, this thesis has studied the fault tolerance of two classes ofinterconnection networks, and has proposed efficient pessimistic diagnosisalgorithms for two classes of interconnection networks. These findings benefit thepracticality of the above-mentioned interconnection networks.
Keywords/Search Tags:high performance computer, interconnection network, reliability, conditional connectivity, pessimistic diagnosis
PDF Full Text Request
Related items