| A cluster is an architecture constructed by a group of nodes connected by a high-speed network. It implements the efficient parallel processing by centralized scheduling and coordinated control. Its performance is often hundreds or even thousands times higher than the contemporary PC. Because of the high price/performance, the clusters have become the development of large-scale computing following after supercomputer.Compared to the traditional supercomputer, the cluster is constructed through a loosely couple approach. This approach makes the cluster more scalable, while resource management and job scheduling are required high demanding. The principal functions of job scheduling is to meet the demand of user, carry out unified manage for the cluster's resource, and arrange jobs' running in an efficient way. Essentially, the job scheduling determines which job to run (select jobs), when, where (allocate resource), and how a job should run through a sort of strategy. As a good job scheduling system, it often can improve the cluster's utilization; reduce the job response time etc. Therefore, the job scheduling has become a hot research issue.The job scheduling is supported by the cluster scheduling system, and achieved through the job scheduling algorithm. At present, many scheduling algorithms have been developed, such as FCFS, backfill, and reservation. Particularly, backfill algorithm is considered as one of the most efficient scheduling algorithms. But if the actual running time is longer than the estimated time, backfill is unable to make full use of the system resources.In addition, with the development of hardware technologies, more and more clusters are constructed with fat nodes. Through analyzing the theory of parallel application and experimental results, it is found that the more nodes that a parallel application sites, the less performance it has. Especially with a lack of available network bandwidth, the reduction of performance is obvious.In this paper, a job scheduling algorithm which combined the improved backfill with the less-span resource mapping (the number of the nodes which a parallel application crosses) is presented to solve the two issues above. By analyzing the scheduling theory of OpenPBS, a simple scheduling system is implemented based on the OpenPBS. This scheduling system extends the API of OpenPBS and achieves the optimized scheduling algorithm. The results show that the optimized scheduling algorithm can reduce the average wait time, average run time, average slowdown and increase the system utilization. Therefore, the optimized scheduling algorithm has some reference value and prospects for practical application. |