Font Size: a A A

Hadoop Scheduling Algorithm Based On Job Classification And Cost Comparison

Posted on:2016-04-11Degree:MasterType:Thesis
Country:ChinaCandidate:Z LiFull Text:PDF
GTID:2308330479493941Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Hadoop has been welcomed and widely used by many institutes and scholars as an excellent open cloud platform on the process of Cloud development. The scheduler of Hadoop is the key factor of Hadoop’s performance. This paper is aimed at founding a well-performed scheduling algorithm by researching the merit and demerit of the some existing classic scheduler. At last this paper proposed a scheduling algorithm which can reduce the job’s running time on Hadoop. This algorithm is based on job classification and cost comparison. In the end the paper will test the new algorithm’s performance.Current classical schedulers have their own merits. For example, Job Queue Task Scheduler is simple, low overhead and Fair Scheduler gives consideration to both big and small jobs. However, these schedulers have not given consideration to the processor’s performance and memory’s size on nodes. If the computer-intensive job was scheduled to the nodes with high frequency processors, the job’s runtime may be cut down. If the memory-intensive job was scheduled to the nodes with big memory, the job’s task may be more unlikely killed. When a job is scheduled to a node which doesn’t contain the job’s input data, the node has to copy the input data from other nodes. The copy process will cost more time than the other situation. When a job is about to be scheduled to a node which doesn’t contain the job’s input data, the algorithm in this paper will predict the time of copying data, queuing and running on this node(un-local scheduling time). At the same time this p will predict the time of waiting to next heartbeat and schedule the job to a node which contains the job’s input data and queuing, running on this node(local scheduling time). The algorithm in this paper will make decision according the comparison of un-local and local scheduling time.This paper proposed the job classification and cost comparison scheduling algorithm. This algorithm consists of two child algorithms: job classification algorithm which schedule a job by the node’s type and job’s type; cost comparison which make decision by the cost of un-local and local scheduling when a node doesn’t contain the current job’s input data. The job classification algorithm uses machine learning to make job’s type match Task Tracker’s type. The cost comparison algorithm eliminates blindness in unlocal scheduling. Eventually the two child algorithm can be mixed up organically. This paper did a lot of tests to measure the algorithm’s performance and the result showed that the algorithm works pretty well.
Keywords/Search Tags:Scheduling, Job Classification, Cost Comparison
PDF Full Text Request
Related items