Font Size: a A A

Resource management for scientific workflows

Posted on:2013-06-09Degree:Ph.DType:Thesis
University:University of Southern CaliforniaCandidate:Juve, Gideon MFull Text:PDF
GTID:2458390008473991Subject:Computer Science
Abstract/Summary:
Scientific workflows are a parallel computing technique used to orchestrate large, complex, multi-stage computations for data analysis and simulation in many academic domains. Resource management is a key problem in the execution of workflows because they often involve large computations and data that must be distributed across many resources in order to complete in a reasonable time. Traditionally, resources in distributed computing systems such as clusters and grids were allocated to workflow tasks through the process of batch scheduling. The tasks were submitted to a batch queue and matched to available resources just prior to execution. Recently, due to performance and quality of service considerations on the grid, and the development of cloud computing, it has become advantageous and, in the case of cloud computing, necessary for workflow applications to explicitly provision resources ahead of execution. This trend toward resource provisioning has created many new problems and opportunities in the management of scientific workflows. This thesis explores several of these resource management issues and describes some potential solutions.;This thesis makes the following contributions: 1. It describes several problems associated with resource provisioning in cluster and grid environments, and presents a new provisioning approach based on pilot jobs that has many benefits for both resource owners and application users in terms of performance, quality of service, and efficiency. It also describes the design and implementation of a system based on pilot jobs that enables applications to bypass restrictive grid scheduling policies and is shown to reduce the makespan of several workflow applications by 32%-48% on average. 2. It describes the challenges of provisioning resources for workflows and other distributed applications in Infrastructure as a Service (IaaS) clouds and presents a new technique for modeling complex, distributed applications that is based on directed acyclic graphs. This model is used to develop a system for automatically deploying and managing distributed applications in infrastructure clouds. The system has been used to provision hundreds of virtual clusters for executing scientific workflows in the cloud. 3. It describes the challenges and benefits of running workflow applications in infrastructure clouds and presents the results of several studies investigating the cost and performance of running workflow applications on Amazon EC2 using a variety of different resource types and storage systems. These studies compared the performance of workflows in grids and clouds, characterized the virtualization overhead of workflow applications in the cloud, compared the cost and performance of using different storage systems with workflows in the cloud, and evaluated the long-term costs of hosting workflow applications in the cloud. 4. It investigates the issue of predicting the resource needs of workflow applications using historical data, and describes a technique for collecting detailed resource usage records for workflow applications that is applied to several real applications. In addition to estimating resource requirements, this data can also be used as inputs for simulations of scheduling algorithms and workflow management systems, and for identifying problems and optimization opportunities in workflows. This technique is used to collect and analyze the resource usage of six different workflow applications, which is analyzed to identify potential bugs and opportunities for optimizing the workflows. 5. It investigates issues related to dynamic provisioning of resources for workflow ensembles and describes three different algorithms (1 offline and 2 online) that were developed for provisioning and scheduling workflow ensembles under deadline and budget constraints. The relative performance of these algorithms is evaluated using several different applications under a variety of realistic conditions including resource provisioning delays and task estimation errors. It shows that the offline algorithm is able to achieve higher performance given perfect conditions, but the online algorithms are better able to adapt to errors and delays without exceeding the constraints.
Keywords/Search Tags:Workflow, Resource, Scientific, Used, Data, Algorithms, Technique, Computing
Related items