Font Size: a A A

Research On Checkpoint Subsystem For Linux SSI Cluster

Posted on:2012-09-06Degree:MasterType:Thesis
Country:ChinaCandidate:H L LiFull Text:PDF
GTID:2218330368982083Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the military, commercial and natural sciences areas, computer needs to have very high reliability. Improving the reliability of the computer is important to the correctness of computer calculations and the stability of continuous working of computers. In the field of high availability computers, the single system image cluster which is constructed using cluster technology gradually prevails. Single system image cluster provide a strong fault tolerance and single system view for users in a transparent way, which ensures the high reliability meanwhile enables users to access the cluster in a more convenient way.Most of the current prevailing high availability cluster adopted user-level implemented checkpoint technology, however, this kind of implementation have the lack of transparency and have many functional limitations. The system-level implemented checkpoint tool can save the checkpoint at the size of process transparently, we can implement full-featured, more efficient in saving and restarting checkpoint component as it is easy to access all relevant data and process status in kernel space. In addition, the existing cluster which used checkpoint fault-tolerant technology generally simple integrate open source components and simple use coordinate checkpoint algorithm to build checkpoint subsystem of cluster, lack of designing efficient global checkpoint algorithm for cluster application. Despite the global checkpoint algorithm in distributed environment have a number of research achievement, however, as different algorithms have their own limitations:the loss of efficiency due to system scale increases or the number of inter-process messages increases, they are unsatisfactory in practice.In this paper, the structure and function of cluster checkpoint subsystem is analyzed from the perspective of the cluster software architecture, based on the research on single machine checkpoint software and global checkpoint algorithm, proposed incremental optimization strategy based on Linux kernel and a kind of checkpoint algorithm based on communication unit partition. The algorithm proposed static and dynamic partitioning strategy, use heuristic multi-level graph partitioning method to divide communication unit in the case of dynamic partitioning strategy, and apply coordinate checkpoint algorithm inside communication unit, uncoordinated checkpoint algorithm between communication units according to the characteristics of each algorithm. The algorithm take advantage of coordinate checkpoint algorithm and uncoordinated checkpoint algorithm, which can maintain good scalability and low time and space overhead when system scale increases or the number of inter-process messages increases, the incremental checkpoint strategy based on Linux kernel can further reduce space overhead. They can be good for the implementation of single system image's checkpoint saving and rollback recovery function. The comparison of the simulation results show the checkpoint algorithm based on communication unit partition has lower overhead, the algorithm can promote the performance of checkpoint and restart phase, and it is suitable for constructing efficient cluster checkpoint subsystem.
Keywords/Search Tags:single system image, fault tolerance, checkpoint, communication unit, graph partitioning
PDF Full Text Request
Related items