Font Size: a A A

Research Of The Job Scheduling Algorithm On Hadoop Cloud Platform

Posted on:2015-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:L M YueFull Text:PDF
GTID:2298330434957711Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The giant companies of the Internet, such as Google, Yahoo, Amazon andMicrosoft and so on, have large amounts of data. The exponential data bring out manyproblems, so they have to discover the new technologies to anaylyze TB and PB levelmass data to achieve useful information. The message is useful to those companies tofind the popular books and musics, and recommend the popular news and books to thepotential customers. But, the existing tools are becoming unable to handle such largedata sets. Google, the first company who provide the MapReduce programmingmodel and the model is able to process the data of PB levels in inexpensive computerclusters parallelly. This solution has attracted many companies in academical and theindustrial, because many companies face the same challenge of data expansionproblems. The problem was that many companies didn’t have the ablity to developtheirs own tools. Some opensource software, such as Hadoop, OpenStack and so on,offer promising hope those companies, and the companies can store huge amounts ofdata on inexpensive computer clusters, and be able to use the idea of MapReduceparallel processing of these massive data, as they save a lot of computing and storagecosts.As the existing Hadoop cluster contains various types of jobs, some jobs’completion time is not required, but others must finish in the given time to reducesome losses of the company, for which this article focuses on the various operatingcluster scheduling the execution flow, analyzed existing MapReduce schedulingtechnology, and the paper find that the current scheduling algorithms can not supportthe higher requirements for time-critical jobs, so the job scheduling algorithms havesome room to improvement, so we designd and implemented a double queue jobscheduler, and studied the slow node cluster determination method, besides, we studythe slow tasks which influence the jobs’finish time. The task will be setup a speculatetask in other node to speed up the task. The purpose of the improvement is to reducethe waste of resources in the cluster, and fufill the needs of the users.And finally, we set up a Hadoop cluster with our double queue scheduler toverify the performance of the improved scheduling algorithm, there are various typesof jobs in a cluster, the scheduler can perform better when the cluster has the deadlinejobs, and it can speed up the job’s finish time so that they can improve the utilizationof resources in the cluster to meet the needs of various users.
Keywords/Search Tags:hadoop, job scheduling, cloud computing, specutive task
PDF Full Text Request
Related items