Font Size: a A A

Reversed Bully Election Algorithm And The Application On The State Check In Clusters

Posted on:2010-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:W LiFull Text:PDF
GTID:2178360272495893Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
While the increasing popularization of the internet promoted the dramatic development in providing commercial services via internet and the large-scale information processing, the higher demand was put forward for the availability of computer systems. The High Available Cluster can effectively prevent the system failure caused by the single fault to maintain providing services without any downtime for a long period, which greatly improved the availability of the system.A basic and key technology in the High Available Cluster is detecting the failure nodes. That is, to carry out the availability, the system has to real-time check the state of the network, the nodes, the software and the storage system. Once a failure node is detected, the tasks in the failure node will be migrated to other normal nodes. Heartbeat is one of techniques that are popular and effective in detecting failure nodes. In present, Heartbeat for two nodes is mature. In order to integrate more business into one system and make the high available environment support more servers, a cluster with multiple nodes needs to be built. However, few domestic manufactures can develop a cluster with multiple nodes. The high-end market that mainly consists of clusters with multiple nodes is monopolized by the manufactures abroad.In the Heartbeat for multiple nodes, in order to reduce the network loads, one node which acts as a server overall manages the detection in the whole system. When this server fails, a new server must be elected to undertake the task of the management. The election of the new server becomes a key technique. The server can also be called the"Coordinator"in the system. The coordinator election is a basic problem in distributed computing. Many election algorithms in different networks and systems have been proposed. Bully Election Algorithm is a classical election algorithm in the distributed system. After proposed by Garcia-Molina, the Bully Election Algorithm was modified in different aspects.Analyzed the Bully Algorithm and the modified algorithms, the new algorithm was not modified by reducing the complexity of messages transmission in one election. After analyzing, the author found out two common features between the Bully Election Algorithm and the Modified Algorithms. First, they both take the unique identification number of the node as a key factor in the coordinator election. If the nodes with bigger identification number are active, the nodes with smaller identification number will not be elected to be the coordinator. Second, they both consider one election and the aim is to reduce the number of message transmission in one election. Thus, we found that, if the number of the election could be reduced, the total number of message transmission in a period would be decreased dramatically and the system would provide services without downtime for a longer time.The flow of the Bully Election Algorithm indicates that, after several elections in a period, if the node with the biggest identification number is active or can recover from the failure, it would be the coordinator of the system in the end. However, for the computer, the increasing of the time and the workload inevitably could make the performance and the health status decrease. The direct represent or the result of the bad health is that the node fails. Because the computer as a coordinator has to coordinate the checking tasks of all nodes, it will undertake more loads than other normal nodes. Therefore, if a node is elected to be a coordinator frequently, its failure frequency will probably increase and the frequency of the election will also increase.In the thesis, reversing the idea of the Bully Election Algorithm, a new election algorithm based on dynamic priorities called Reversed Bully Election Algorithm was proposed. The basic idea of the new election algorithm is that, the node with the best health status is the most suitable to be the coordinator, so in each election the node with the best health status is bullied by other nodes to be the new coordinator. Thus, the possibility of failure will reduce for the node with the comparatively better health status and the times of elections will also decrease. In the algorithm, each node is given a priority number, which changes dynamically with the health status of the node, and the election is run according to the priority number of all nodes. For different ways of determining the priority number, two kinds of Reversed Bully Election Algorithm were proposed in this thesis. The processes of recovering the failed nodes and initiating the new nodes were redesigned, the idea of which is that, if the coordinator is active, the other nodes is not allowed to be the new coordinator by occupying the position of the old coordinator however the priority of the node is. It also will decrease the election times. After the description of the new algorithm, the process of recovering the failed nodes and initiating the new nodes was discussed. This meets the scalability of the cluster.The performance was evaluated after the Reversed Bully Election Algorithm proposed. Firstly, the validity of the new algorithm as an election algorithm was proved in theory. Then, the availability of the new algorithm was proved by the simulation experiment and the result showed that the new algorithm greatly reduced the number of message transmission in the network inside the system compared Bully Election Algorithm. It also showed that the difference of the number of message transmission between the two election algorithm was going more obviously with the increase of the system running time.The new algorithm still has some disadvantages. The process of the dynamic priorities updating would result in the receiving lag of the dynamic priorities of all nodes. But it could not seriously affect the results. The Hypothesis 7 and 8 are a little bit strong, and they would happen in the real systems. So we would release these two hypothesizes in the future so that the new algorithm could adapt the real system environment.In the end, an architecture model of a cluster for data search was designed after analyzing the demands. The Reversed Bully Election Algorithm was applied on designing the function of the service state check, and for multiple nodes of the server pool in the cluster, a Heartbeat model for multiple nodes was designed. The effectiveness of the Heartbeat model for multiple nodes would be improved by using the proposed Reversed Bully Election Algorithm.
Keywords/Search Tags:Bully Election Algorithm, Reversed Bully Election Algorithm, Election Algorithm, Cluster, State Check
PDF Full Text Request
Related items