Design And Implementation Of Multi-machine Fault-tolerant System On Linux

Posted on:2008-11-21

Degree:Master

Type:Thesis

Country:China

Candidate:J Y Zhang

Full Text:PDF

GTID:2178360242967552

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the pervasive application of computer technology and internet, people are relying on the computer system increasingly. Some vital operation system demands on computer system with high availability to insure against the continuity of application. So application system needs the ability to heal itself and run as long as possible to avoid the interruption of services which may be caused by some reason. For little scale application, dual-machine fault tolerant technology is commonly used. This kind of system has excellent ability to tolerate fault with small investment. With the increase of transaction, applications demand more powerful computed ability. Dual-machine will give way to multi-machine with excellent expansibility because it can not afford it. So this research has attracted many people's interest and people have invested in more efforts than erver before. For example, the OpenSource project LVS and LinuxHA who were developed by community are using widely in industry.Under this kind of application background, this paper proposes multi-machine fault-tolerant system which works on Linux. It affords two level of fault-tolerant function which provides protection for application process and computing node by the cooperation of all the computers. If the server processes exit abnormally, multi-machine fault-tolerant system will notice it and cooperate with other machines to take care of the failed service. In the same way when the heartbeat detection protocol notice that one has failed, the rest will run distributed selection algorithm to pick the agent who will completely take care of the failover and try to make the service available as soon as possible.This system uses loose coupling structure to organize the nodes, so it has great extension that means any node can join or leave the cooperation relationship any time as it wants. In order to complete detecting the invalidation of computing node quickly and accurately, this paper designs and implements heartbeat detection protocol especially that works in kernel. Because this protocol runs as network protocol entity and avoids the influence of process schedule which affects the application processes, it can detect node failover with shorter delay compared with implemented in user mode.

Keywords/Search Tags:

High Availability, Heartbeat Detection, Fault-tolerant

PDF Full Text Request

Related items

1	Design And Implementation Of Cluster Fault-tolerant System
2	Fault-tolerant Software Design And Implementation Based On Fault-tolerant Computer System
3	Research And Implementation Of Key Technologies On High-Availability Cluster System
4	Research And Implementation Of High Availability's Key Technology In High Performance Router Software
5	Research Of AST3 System High Availability
6	Design And Implementation Of Distributed Multi-machine Fault-tolerant System
7	Design And Implementation Of Computer Availability Modeling And Assessment Tool
8	Design Of A High-availability SDN Architecture And Its Key Technology Research
9	The Study And Design Of High Availability Monitoring Subsystem For Fault Tolerant Computing Systems
10	Multi-machine Cluster Heartbeat