Font Size: a A A

Research And Implementation Of A Failure Detection System With Composite Structure

Posted on:2013-01-26Degree:MasterType:Thesis
Country:ChinaCandidate:M YangFull Text:PDF
GTID:2248330362470882Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, information systems are widely used inthe fields of telecommunications, aerospace, military and other key applications. Researching anddesigning on effective failure detection system are urgently needed, in order to providing reliabilityfor information systems in military field. Compared to common information system, militaryinformation system has the following characteristics. Firstly, they usually have large scales. Secondly,there’s requirement for supplying detecting service in the case of several nodes failed at the same time.At last, they are always having heavy business with high loading. Typical failure detection models ofsingle structure are suitable to theoretical researching or small systems, and has obvious defect ifapplied in military information system.Aiming that, taking military command system as the background, performance requirements intimeliness, continuity and expansion of failure detection system on this field were analyzed, and thelimitation of typical failure detection model in various scenarios was researched and summarized.Subsequently, a failure detection model with composite structure was proposed.Using this model, thesystem was divided into groups having the same size, and then corresponding failure detectingalgorithms were designed respectively within and between those groups.Inner group, ring algorithm with fast reconstruction mechanism was used for failure detection innode level. Every nodes only communicated with the previous one and the next one, detecting statusof the precursor according to the heartbeat messages from it. When finding the precursor failed, theinformation was sent to the next node after the detector, and passed along the ring structure one nodeby another. At the same time, the reconstruction request was sent to the indirect preceding node.Retaining the advantages of having simple structure and low load in typical ring algorithm, thealgorithm satisfied the continuity of the demand by processing reconstruction and failure propagationparallel, and guarantees a certain timeliness by limiting the group size.Between groups, hiberarchy was used, and algorithm was designed to detect failure in grouplevel. Each of the upper nodes was responsible for the detection of a low layer group, and shared theinformation with other nodes in the same group. In view of the overall situation, each node in higherlevel took charge of detecting a set of nodes in lower level. Maintaining good expansion, each unit inthe hiberarchy was extended to a group, thereby reducing the probability of occurrence of single pointof failure. Finally, the module design and process design of the failure detection system was shown,basedon the practical project, and the failure detection system was implemented under the VxWorksreal-time operating system, then applicated and verified in the practical application system.
Keywords/Search Tags:Fault Tolerance, Disaster Tolerance, Failure Detection Model, Hybrid Structure, Diagnostic Delay, Extensibility
PDF Full Text Request
Related items