Font Size: a A A

Grid Computing Oriented Dynamic Fault Tolerance Service Strategies And Corresponding Algorithms

Posted on:2008-02-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:D TianFull Text:PDF
GTID:1118360215990030Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Grid computing has the capability of breakthrough current computational barriers, integrating large scale distributed and free resources, resolving complex scientific and engineering problems collaboratively, flourshing the development of scientific research and engineering practice extremely. Nevertheless, accomplished by the highly dynamic and heterogenous characteristics, grid systems are more prone to failures. The frequently occurrence of failures is becoming a main problem that obesses many scientists, engineers and users. How to improve reliability and robustness of grids, by means of introducing an appropriate fault-tolerance mechanism, is one of the most difficult issues in literature.Based on comparison research, the fault tolerance requirement of grid computing is summarized, and a suit of dynamic fault tolerance service strategies are constructed. Moreover, the corresponding algorithms are presented, such as adaptive fault detection algorithms, QoS based fault handling service selecting algorithms, and efficient fault recovery algorithms. The main contents are as follows.â‘ According to the characteristics of grids, the author summarized the special fault tolerance requirements of grid environment. Combined with the user's QoS demands, the author extensively defined the concepts process fault, processor fault, and network fault in grid environments. Furthermore, the architecture of the dynamic fault tolerance service was designed, which included grid fault detection and fault management services. Then, the fault tolerance service strategies are proposed.â‘¡Aimed at the problems that Grids are more prone to failures, and existed failure detection algorithms can not satisfy the fault detection requirements of Grid computing efficiently, the dissertation presented a suit of adaptive fault detection algorithms. According to the characteristics of grid systems, based on the unreliable fault detection theory, combined heartbeat strategy with grey prediction theory, the author designed a dynamic heartbeat mechanism, presented the prediction model and real-time prediction strategy, and proposed the fault detection algorithm between grid processes further. Moreover, by using active network method, a dynamic layered organization algorithm of fault detectors is then presented, and the performance of the algorithm was analyzed theoretically. At last, simulation results demonstrated the correctness and effectiveness of the algorithm. â‘¢Aimed at the problem that how to select fault handling service for different grid application programs, the dissertation put forward a fault handling service selectiing algorithm in terms of clients'QoS requirements. On the basis of defining several normal fault handling technologies in formal, the author presented an extensible QoS model for grid fault handling. Meanwhile, the QoS based decision problem was abstracted as a multi-property decision problem, and the decision model was constructed. Furthermore, in order to overcome the shortcomings of the simple subjective weight mode or objective weight mode, the author resolved the model by means of subjective-objective weight mode. In the end, the QoS based fault handling service selecting algorithm was depicted, the correctness and effectiveness of the algorithm was demonstrated by simulation.â‘£Aimed at the problems that attended hosts are widely dispersed, message transmission latency is very large, and traditional methods can not meet the failure recovery requirements of grids. Combining with message log protocol, the author presented a suit of adaptive fault recovery algorithms. Firstly, in light of the problem that bandwidth between any two nodes is not fixed and the internal infrastructure of the system is highly variable, an adaptive optimistic message log protocol for grid systems was designed. Secondly, according to the wide and large scale characteristic of grids, a scalable grid computing model was constructed. Thereafter, the author presented the adaptive failure recovery algorithm. Finally, by means of theoretical analysis and simulation, the author demonstrated the correctness and effectiveness of the protocol and the algorithm.To sum up, according to the fault tolerance requirements of grid computing, this dissertation proposed a suit of dynamic fault tolerance service strategy, as well as a series of corresponding algorithms. By means of theoretic analysis and simulation, it can be clearly concluded that the strategies and the algorithms are correct and effective, which can be used in grid computing environment, and have the advantages on improving reliability of grid systems.
Keywords/Search Tags:Grid computing, Fault tolerance computing, Fault detection, Service selection, Fault recovery
PDF Full Text Request
Related items