Font Size: a A A

Job co-allocation strategies in multiple HPC clusters

Posted on:2010-12-08Degree:Ph.DType:Thesis
University:The University of Western Ontario (Canada)Candidate:Qin, JinhuiFull Text:PDF
GTID:2448390002970473Subject:Computer Science
Abstract/Summary:
To more effectively use a network of high performance computing clusters, allocating multi-process jobs across multiple connected clusters becomes an attractive possibility. This allocation process entails dividing the processes of a job among several clusters which we refer to as co-allocation. Co-allocation offers the possibility of more efficient use of computer resources, reduced turn-around time and computations using numbers of processors larger than processors on any single cluster. In order to realize these possibilities, effective co-allocation, ultimately, depends on the inter-cluster communication cost. In this thesis, we introduce a scalable co-allocation strategy called the Maximum Bandwidth Adjacent cluster Set (MBAS) strategy. The strategy makes use of two thresholds to control allocation: one to control the bandwidth levels on inter-cluster communication links and another to control how jobs are split. To evaluate the performance of the proposed strategy, a simulator that can simulate the dynamic behavior of jobs running across multiple clusters has also been developed and validated in this research. The simulation results indicate that by adjusting the thresholds for link saturation level control and chunk size control in splitting jobs, the MBAS co-allocation strategy can significantly improve both users' satisfaction and system utilization. However, the situation is more complicated in reality as the mix of communication patterns can vary. Being able to dynamically adjust the thresholds may provide a more effective approach to co-allocation. In the thesis we introduce the Adaptive Threshold Control System (ATCS). Based on fuzzy logic, ATCS can adjust the thresholds dynamically according to system states and jobs' characteristics. The simulation results suggest that using ATCS during MBAS job co-allocation the overall performance can be improved further than by just using static thresholds. Moreover, this improvement is much more tolerant to the changes of job communication requirements; while this is a problem for using static thresholds. In addition, ATCS provides the flexibility to enable a system to be tuned to achieve a more expressive co-allocation control in practice.;Keywords: resource management, job co-allocation, job scheduling, HPC clusters, performance evaluation, workload characterization, fuzzy control, adaptive control...
Keywords/Search Tags:Job, Co-allocation, Clusters, Multiple, Performance, ATCS
Related items