Design And Implementation Of Cluster Fault-tolerant System

Posted on:2009-01-24

Degree:Master

Type:Thesis

Country:China

Candidate:X Li

Full Text:PDF

GTID:2178360272470527

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

In the reaserch of the high performace of the computer, the first problem is how to keep the reliable and availability of the computer. Cluster is the mainstream architecture for high performance computing because of its low cost and good scalability. The loose-coupling architecture between nodes makes cluster system easier to implement high available than centralized system. But with the scale of cluster system become more and more large, some new problems are brought along. The purpose of this thesis is to increase the availability of the cluster. This paper proposes a cluster fault-tolerant system. The system includes four moudules: user module, center moudule, process control's module and heartbeat moudule. Four moudules cooperate to accomplish the functionality. There has detailed introduction in this paper. This system uses loose coupling structure to organize the nodes. It can heal itself and run as long as possible to avoid the interruption of services which may be caused by some reason. This system has great extension that means any node can join or leave the cooperation relationship any time as it wants.It affords two levels of fault-tolerant. Heartbeat mechanism is the most common technology to achieve the reliavle communication of the high availability system. In order to complete detecting the invalidation of computing node quickly and accurately, this paper designs a new real-time heartbeat which can dynamically link into the linux kernel. It can avoid the influence of process schedule and detect node failover with shorter delay compared with implemented in user mode. This paper use netlink connector to detect the failure of the process. The exit of process is looked as abnormal unless it was not under inspected. When the heartbeat detection protocol notice that one has failed, the rest will run distributed selection algorithm to pick the agent who will completely take care of the failover. It restarts the process to keep the avalibility of system. The availiability and robustness of the system are improved to a certain extent.

Keywords/Search Tags:

High Availability, Heartbeat Detection, Fault-tolerant

PDF Full Text Request

Related items

1	Design And Implementation Of Cluster Fault-tolerant System
2	Fault-tolerant Software Design And Implementation Based On Fault-tolerant Computer System
3	Research And Implementation Of Key Technologies On High-Availability Cluster System
4	Research And Implementation Of High Availability's Key Technology In High Performance Router Software
5	Research Of AST3 System High Availability
6	Design And Implementation Of Distributed Multi-machine Fault-tolerant System
7	Design And Implementation Of Computer Availability Modeling And Assessment Tool
8	Design Of A High-availability SDN Architecture And Its Key Technology Research
9	The Study And Design Of High Availability Monitoring Subsystem For Fault Tolerant Computing Systems
10	Multi-machine Cluster Heartbeat