Cluster technology is to connect multiple servers together with cluster softwares, so as to form a computer system composed of large-scale server clusters with high transparency. This system will work as a whole to offer service to the client, and the client can share all the resources on network, such as data, applications and so on. Meanwhile, the user at client will not care about which server the application is running on. Instead, he will only concern whether the application can run continuously. When failures occur on a certain server in the cluster system, the standby server will take over its application service and continue offering service for the users.For any type of cluster product, the core function should be the failure monitoring. The category amount of the monitoring resource and the monitoring level are vital indexes to evaluate the high availability of cluster softwares. Besides, in the cluster system, the mutual health status of the servers should be checked regularly, which is called"Heartbeat Detection"."Heartbeat Detection"is mainly carried out through network, including the heartbeat detection for private network and public network, where the latter acts as the backup network. An excellent cluster software should has a self-contained heartbeat detecting mechanism, so as to avoid the improper switching caused by the heartbeat timeout under high-load conditions. Heartbeat acts as vital roles (such as communication between codes, failure judging, event invoking and so on) in the cluster software, and it is the core component of the cluster software.Heartbeat refers to sending communicational signals between the master system and standby system at a regular interval to demonstrate the current running status of the respective system. Once the heartbeat signal implies failures occur on the master system or the standby system can not receive the heartbeat signal from the master system, the high-available management software will regard failure occurs in the master system. Then it will stop the work on the master system and transfer the system resource to the standby system. After that, the standby system will replace the master system to make sure the network service can go on continuously. The RoseCluster system of Rose Datasystems Inc directly connects multiple servers with the disk array system. User's operating system, application softwares and HA softwares of RoseCluster are installed on multiple servers, while the shared data, such as dataset, are stored in storage system. The servers are connected by priviate heartbeat network. We can determine whether current server needs to be switched to another according to the heartbeat network, so as to ensure the business continuity and data continuity. This thesis mainly studies on the design of heartbeat system in RoseCluster and the way to improve the heartbeat transmission technology & the heartbeat security, so as to ensure the validity for server node detection, avoid misoperation and keep the data continuity of the enterprise. |