Font Size: a A A

An architecture for checkpointing and migration of distributed components on the grid

Posted on:2005-08-16Degree:Ph.DType:Dissertation
University:Indiana UniversityCandidate:Krishnan, SriramFull Text:PDF
GTID:1458390008487265Subject:Computer Science
Abstract/Summary:
A computational Grid is a set of hardware and software resources that provide seamless, dependable, and pervasive access to high-end computational capabilities. The Grid differs from other computational resources such as traditional supercomputers and clusters by the following key features: (1) coordination of resources that are not subject to centralized control, (2) use of standard, open, general purpose protocols and interfaces, and (3) delivery of non-trivial qualities of service despite unpredictable resource availabilities.; The Open Grid Services Architecture (OGSA) is the first effort to standardize Grid functionality, based on concepts from the NVeb services community. However, the Web services based OGSA presents a server-centric approach which is not very conducive to the orchestration of complex distributed applications where the interactions are not always of the client-server type. We present a distributed component based approach for composing complex applications on the Grid that is conformant with the Common Component Architecture (CCA), while maintaining compatibility with Grid standards.; Because Grid resources are not subject to centralized control and are geographically distributed, their availabilities may be very dynamic in nature. Migration of individual components can be an effective strategy for dealing with dynamic resource availabilities. However, migration of components that are part of a distributed application is complicated due to the possible interactions between them during execution. We present an approach for migration of distributed components, in the presence of communication between them. Additionally, reliability of Grid resources is also very difficult to guarantee. Checkpointing applications and rolling back to a saved state is an effective form of fault tolerance for dealing with failures of such resources. However, due to the distributed nature of the applications, the checkpoints generated need to be globally consistent. We present our approach for check-pointing and restart of distributed components for fault tolerance purposes.
Keywords/Search Tags:Distributed, Grid, Resources, Migration, Architecture, Approach
Related items