Font Size: a A A

Efficient data and information delivery for workflow execution in grids

Posted on:2010-02-12Degree:Ph.DType:Thesis
University:University of Southern CaliforniaCandidate:Bharathi, ShishirFull Text:PDF
GTID:2448390002987607Subject:Computer Science
Abstract/Summary:
In recent years, scientific communities have increasingly adopted computational workflows to model and execute data processing and visualization applications. Some of these applications consume and generate large amounts of data. The planning and execution time strategies employed to stage data in and out of compute resources can have a significant impact on the overall execution of the workflow.;The primary focus in this thesis is on identifying execution time data management strategies that reduce the execution time of the workflow. We incorporate a data placement service component into the workflow execution framework that utilizes information provided by the planner to efficiently deliver data in and out of compute resources.;We present detailed characteristics of the data processing requirements of five diverse scientific applications. We use this information to generate synthetic workflows that closely resemble these real world applications. We present a framework that classifies various data staging strategies into decoupled, loosely-coupled, or tightly-coupled modes based on the level of integration of the data placement service with the workflow manager component. We also present the results of a detailed simulation study that evaluated the impact of various data staging strategies on workflows of different scales and structures.;Next, we focus on applying tightly-coupled data staging strategies to the execution of data intensive workflows on storage constrained resources. We identify key problems in this area and develop a heuristic aimed at minimizing workflow execution times under storage constraints. We apply genetic algorithm based approaches to these problems and show that the performance of our heuristic is comparable to the best genetic algorithm solutions.;Finally, we consider the planning stage and discuss the challenges of delivering information about Grid resources to scheduling applications. We present our work in the design and implementation of a peer-to-peer information system based on the GT4 Index Service. We present experimental results demonstrating the scalability of the service in different networks including the PlanetLab test bed.
Keywords/Search Tags:Data, Workflow, Execution, Information, Applications, Present, Service
Related items