Font Size: a A A

The Design And Implementation Of A Job Management And Scheduling System In A Special Computing Cluster Group

Posted on:2006-10-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2178360185963646Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The HPC cluster system has powerful parallel computing and large-scale batch computing ability. Thus, it can meet various application requirements. Large-scale parallel computation is main model of cluster-based applications. At the same time, large-scale batch computation also is an important model of cluster-based applications. When a lot of jobs are submitted to clusters, special resource allocation and scheduling policies need to be implemented for system optimization. The paper focuses on this subject.This paper researches the optimization of job scheduling and running efficiency in the special multi-cluster environment for CFD, and discusses the techniques of job scheduling, job migration, file system backup and restoration as well as web-based job submitting. The computer sysytem in my unit consists of a group of computing clusters, and it is called the special cluster group.For the problems of resource reserving plan and memory use efficiency, the author designs and implements a job scheduling system with dedicated job scheduling algorithm, and puts forward a special algorithm to estimate the memory usage value of the jobs whose memory usage are changing continuesly.For the problems of load balance between clusters, the author and his research team build a job migration management system which is able to work with dedicated job scheduling system of every cluster, set up the multi-cluster resource reserving plan and the mechanism to enable jobs to utilize the idle resources in whole cluster group by migrating, and formulates a algorithm to compare migrating destinations.The system selects the PVFS parallel file system to increase the I/O performance of large-scale clusters. To solve the key problems that bring low the availability of PVFS, the author proposes an original technique to rapidly backup and restore the file system.This paper also introduces the design of the web-based job submitting and management system.The above research results are used in my unit, and get a good effect. At the busy time, the utilization of the system resources increases from nearly 80% to 95% or up, and there is no job queuing problem when one or more CPUs are idle.
Keywords/Search Tags:cluster, schedule, migration, file system
PDF Full Text Request
Related items