Font Size: a A A

User-aware scheduling for high performance computing clusters

Posted on:2006-06-14Degree:Ph.DType:Dissertation
University:Illinois Institute of TechnologyCandidate:North, Michael JFull Text:PDF
GTID:1458390008456869Subject:Statistics
Abstract/Summary:
High performance computing clusters have been a critical resource for computational science for over a decade and have more recently become integral to large-scale industrial analysis. Despite this critical role, cluster performance is difficult to predict and challenging to control. This uncertainty limits the ability of cluster users to anticipate the service levels they will receive for specific resource requests. The result is a degradation of the value of cluster services and a reduction in the accuracy of user scheduler inputs. One of the root causes of this uncertainty is the complex relationship between user behavior, user resource utilization, and cluster operation. This relationship has been studied by many researchers. The resulting workload models largely focus on long-range correlations and thus assume there is little or no short-range user-level locality in workloads. While these models have been successful for their purposes, this dissertation shows that there are far more correlations to be found in user workloads. This short-range locality may be particularly important since real time measurements of this type of correlation require minimal data collection latency. This opens the possibility of scheduler enhancements that immediately utilize information about short-range locality. This dissertation includes presents several contributions that characterize the short-range locality in cluster workloads and then presents contributions that show how these results can be used to improve scheduler efficiency. Notably, this dissertation applies regression trees for the first time to model Maui scheduler workloads. The result is the first unified regression estimator that successfully predicts wall clock times at the 0.6 adjusted R2 or above for two different production clusters without changing regression parameters. Also for the first time, this dissertation proposes Maui scheduler modifications that apply short-range user behavioral regression tree modeling to improve cluster scheduling performance. This dissertation then demonstrates that the proposed Maui scheduler modifications can reduce the average job waiting time by as much as 28% without introducing the risk of job starvation and while preserving critical cluster productivity measures such as job stream wall clock time and average throughput.
Keywords/Search Tags:Cluster, Performance, User, Critical, Time
Related items