Research Of Dynamic Fault Detection And Handing In Grid System

Posted on:2012-08-31

Degree:Doctor

Type:Dissertation

Country:China

Candidate:X B Ji

Full Text:PDF

GTID:1118330338996604

Subject:Computer applications

Abstract/Summary:

PDF Full Text Request

With the integration between high performance computing and internet technology, Grid systems have developed to be a infrastructure of distributed,heterogeneous and dynamic environment, connecting many kinds of resources above the application layer, providing seamless,reliability and unified service access interface, and achieving transparent access control to hardware,software,data,storage and other resources. Nevertheless, Grid systems are more prone to failures because of the highly dynamic and heterogeneous characteristics, the frequently occurrence of failures has become a main problem that puzzled many scientists, engineers and users. How to build an appropriate fault-tolerance mechanism to improve performance of fault detection and handing, thus ensure the reliability and stability of grid is one of the most difficult issues in Grid systems.Based on comparison research, the fault tolerance requirements of grid systems are summarized, and a dynamic fault tolerance management strategy was constructed. Moreover, the corresponding dynamic fault detection algorithm and QoS-restricted fault handling service selection algorithm are presented; finally, a task-level fault-tolerance service system for users above CGSP is achieved. The main research contents are as follows.â‘ According to the characteristics of Grid, the special fault tolerance requirements of Grid environment were summarized. The author constructed the fault tolerance architecture including fault detection module,fault handling module and request proxy module, then the running process of the model was proposed.â‘¡Aimed at the problem that existing fault detection algorithms can not satisfy the requirement of multi-process fault detection in Grid system, an dynamic and scalable fault detection algorithm was presented. The author established a small world based grid system model and a fault detection model; Combined unreliable fault detection method with heartbeat strategy and grey prediction model, designed a dynamic heartbeat mechanism, and presented the dynamic and scalable fault detection algorithm.The hierarchical architecture of fault detection devices was introduced.The performance of the algorithm such as accuracy, completeness and reliability were analyzed. At last, experimental result demonstrated that the algorithm is valid and effective, can be used for fault detection under Grid environmentsã€‚ â‘¢Aimed at the problem that how to select fault handling service for different grid application programs,the author put forward a QoS-restricted fault handling service selecting algorithm. On the basis of analyzing fault handing related background and requirements, the formal definitions of several normal fault handling technologies were proposed, a scalable QoS-restricted fault handling model was constructed; the QoS-restricted decision problem was abstracted as a multi-property decision problem, and the Information Entropy decision method was constructed. The QoS-restricted fault handling service selecting algorithm was put forword, the correctness and effectiveness of the algorithm was demonstrated by simulation.â‘£Aimed at the research of fault detection and handing, the author proposed the system design and implement of fault tolerance management service. The architecture and management process of platform CGSP were introduced; the design principle of fault tolerance management service was put forward, the core system services such as request proxy service, fault detection service and fault handing service were designed and implemented. Finally, the effectiveness of the fault tolerance mechanisms was demonstrated in a CGSP experimental environment.To sum up, according to the fault tolerance requirements of Grid services, this dissertation proposed a suit of solutions including the dynamic fault tolerance strategy, fault detection and fault handing. By means of theoretic analysis and simulation, it can be concluded that the strategy and the algorithms are correct and effective, which can be used in fault detection and handing in Grid enviroments, and have the advantages on improving reliability and stability of Grid systems.

Keywords/Search Tags:

Grid system, Fault tolerance management, Fault detection, Fault handing

PDF Full Text Request

Related items

1	Grid Computing Oriented Dynamic Fault Tolerance Service Strategies And Corresponding Algorithms
2	Research On Adaption Method Of Cloud Fault Tolerance Services Based On User Requirement And Resource Constriction
3	Online fault detection and post-fault switching strategies to improve the fault-tolerance of matrix converters
4	The Research Of Redundancy And Fault-Tolerant Technology Based On Real-Time Operation System
5	Research On Fault Tolerance For Transactional Memory System
6	Design Theory Of Inertial-based Intelligent Fault-tolerant Integrated Navigation Key Issues Study
7	Study On Fault-Tolerance Mechanism And Realization In Real-Time Distributed Computer Systems
8	The Research On Improving The Fault Tolerance Capability Of Programs In Radiation Environment
9	Research On Fault-Tolerance Technology For Message-Passing System
10	Correlative Study On Fault Diagnosis And Fault Tolerance Of Integrated Navigation System