Font Size: a A A

Grid workflow: A flexible framework for fault tolerance in the grid

Posted on:2004-07-03Degree:Ph.DType:Thesis
University:University of Southern CaliforniaCandidate:Hwang, SoonwookFull Text:PDF
GTID:2468390011976621Subject:Computer Science
Abstract/Summary:
Over the past few years, the Grid has emerged as a new infrastructure for developing so-called Grid applications by enabling the integration of instruments, display, and computing resources that are managed by diverse organizations in widespread locations. Even though the geographically distributed and non-centralized administrative nature of the Grid can make it prone to failures during task execution, the research focus so far has not been on fault tolerance.; This thesis is intended to improve this situation by presenting the Grid Workflow System (Grid-WFS) designed to provide a special form of fault tolerance for the Grid; a generic failure detection mechanism and a flexible failure handling framework.; The generic failure detection mechanism enables the detection of generic task crash failures. In addition, the mechanism allows users to define exceptions to handle task-specific failures without requiring any modifications to both the Grid protocol and the local policy of each Grid node. This thesis describes how to overcome the challenge by employing an event notification mechanism that is based on the interpretation of notification messages being delivered from different entities residing on each Grid node.; The flexible failure handling framework allows users to achieve failure recovery in a variety of ways depending on the requirements and constraints of their applications. Central to the framework is flexibility in handling failures. The heterogeneity of the Grid environment and applications, and the dynamic nature of the Grid dictate that a single monolithic failure recovery strategy is not appropriate. This thesis describes how to achieve the flexibility by the use of workflow structure as a high-level recovery policy specification, which enables support for multiple failure recovery techniques, the separation of failure handling strategies from the application code, and user-defined exception handlings.; Finally, this thesis presents an experimental evaluation of the Grid-WFS using a simulation, demonstrating the value of supporting multiple failure recovery techniques in Grid systems to achieve high performance in the presence of failures.
Keywords/Search Tags:Grid, Fault tolerance, Failure, Framework, Flexible, Workflow
Related items