Font Size: a A A

Transparent dataflow detection and use in workflow scheduling: Concurrency and deadlock avoidance

Posted on:2009-03-03Degree:Ph.DType:Thesis
University:University of Alberta (Canada)Candidate:Wang, YangFull Text:PDF
GTID:2448390002493126Subject:Computer Science
Abstract/Summary:
In this thesis we demonstrate the value of dataflow information to improve makespan performance (i.e., time to complete a set of jobs) in batch-scheduled workloads. Novel mechanisms and policies are introduced to improve job concurrency (i.e., when resources are unlimited) and to reduce the impact of deadlock (i.e., when resources are constrained). Without dataflow information concurrency might be limited, even if resources are unlimited, and resource usage might be inefficient, even if resource utilization is superficially high. The key insight is that dataflow, unlike control-flow, makes it visible when resources can be deallocated or reallocated, which allows for a crucial distinction between active and inactive resource usage. Through a simulation study, we show that the benefits of dataflow information can be a reduced makespan of over a factor of 5, depending on the workload and available resources.;Despite a large body of research on dataflow, most high-performance computing (HPC) systems (e.g., clusters) are batch scheduled based on control-flow status quo. The lack of a simple way to obtain dataflow information and the lack of compelling policies to exploit datatflow may account for the control-flow status quo. Therefore, we describe a simple prototype for transparently gathering dataflow information (i.e., Workflow-aware File System (WaFS)) and several scheduling policies to exploit that knowledge for higher concurrency (e.g., Versioned Namespace (VNS), Overwrite-Safe Concurrency (OSC)) and for better deadlock handling (e.g., Dataflow Aggregate Requests (DAR), Dataflow Topological Ordering (DTO)). Notably, our simulations show how dataflow information allows our policies to have lower makespans than the classic banker's algorithm and Lang's algorithm.
Keywords/Search Tags:Dataflow, Concurrency, Deadlock, Policies
Related items