Font Size: a A A

Toward practical multi-workflow scheduling in cluster and grid environments

Posted on:2010-06-17Degree:Ph.DType:Dissertation
University:Wayne State UniversityCandidate:Yu, ZhifengFull Text:PDF
GTID:1448390002476063Subject:Operations Research
Abstract/Summary:
Workflow applications are gaining popularity in recent years because of the prevalence of cluster and Grid environments. Many algorithms have been developed ever since, however two fundamental challenges in this area, i.e., dynamic resource and dynamic workload, are not well addressed. In cluster and Grid environments, resources may be contributed and controlled by different virtual organizations and shared by a variety of users who in turn submit various kinds of applications. Resources are heterogeneous under different ownership, their availability varies over time and may fail in a high rate. On the other hand, resources are shared and hence competed among many applications with various computation requirements. Existing static algorithms are designed to schedule a single workflow application, without considering other workloads and any resource competition in the system. Hence static approaches are not utilized widely in practice despite its known advantages. Dynamic scheduling approaches can handle the dynamic workload and resources practically by nature but their effectiveness has yet to optimize as they do not have a global view of workflow application and scheduling decision is made nearsighted locally.;In this dissertation, as an effort toward practically scheduling workflow applications in cluster and Grid environments, a failure aware dynamic scheduling strategy for multiple workflow applications is proposed. The approach makes scheduling decision only when a task is ready, as traditional dynamic approach does, but leverages task dependency information, execution time estimation, failure prediction and queue wait time prediction. With preassigned priority for each task by the workflow Planner, the workflow Executor globally prioritizes all the ready to execute tasks in queue and schedules the individual task to the most suitable resource collection in order to minimize the overall workflow execution time. Furthermore, the algorithm is extended to a cluster of clusters environment, where each cluster has its own local workload management system. As a conclusion, the findings of the research is four folded: (1) With adaptability to dynamic resource change, the proposed strategy not only outperforms the purely dynamic ones but also improves over the traditional static ones. And it performs more efficiently with data intensive application of higher degree of parallelism. (2) When guided by the Planner, the proposed strategy can schedule multiple workflows dynamically without requiring merging the workflows a priori. It significantly outperforms two other traditional dynamic algorithms by 43.6% and 36.7% with respect to workflow makespan and turnaround time respectively, and it performs even better when the number of concurrent workflow applications increases and the resources are scarce. (3) We observer that the traditional failure prediction accuracy definitions impose different performance implications on different applications and fail to measure how that improves scheduling effectiveness, and propose two definitions on failure prediction accuracy from the perspectives of system and scheduling respectively. The comprehensive evaluation results using real failure traces show that the proposed strategy performs well with practically achievable prediction accuracy by reducing the average makespan, the loss time and the number of job rescheduling. (4) The proposed algorithm can be augmented to Grids in form multicluster where each cluster has its own workload management system. The proposed queue wait time aware algorithm leverages the advancement of queue wait time prediction techniques and empirically studies if the tunability of resource requirements helps scheduling. The extensive experiment with both real workload traces and test bench shows that the queue wait time aware algorithm improves workflow performance by 3 to 10 times in terms of average makespan with relatively very low cost of data movement.;Finally, the research studies how to benefit from existing researches and practices on both static and dynamic scheduling, introduces a hybrid scheduling scheme, i.e., a planner guided dynamic scheduling approach, targets on dynamic workload on cluster and Grid environment. A prototype is developed based on Condor platform to prove the concept of proposed algorithm.
Keywords/Search Tags:Cluster and grid, Workflow, Scheduling, Algorithm, Applications, Proposed, Queue wait time, Dynamic
Related items