Research Of The Job Scheduling Algorithm On Hadoop Cloud Platform

Posted on:2015-05-18

Degree:Master

Type:Thesis

Country:China

Candidate:L M Yue

Full Text:PDF

GTID:2298330434957711

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

The giant companies of the Internet, such as Google, Yahoo, Amazon andMicrosoft and so on, have large amounts of data. The exponential data bring out manyproblems, so they have to discover the new technologies to anaylyze TB and PB levelmass data to achieve useful information. The message is useful to those companies tofind the popular books and musics, and recommend the popular news and books to thepotential customers. But, the existing tools are becoming unable to handle such largedata sets. Google, the first company who provide the MapReduce programmingmodel and the model is able to process the data of PB levels in inexpensive computerclusters parallelly. This solution has attracted many companies in academical and theindustrial, because many companies face the same challenge of data expansionproblems. The problem was that many companies didn’t have the ablity to developtheirs own tools. Some opensource software, such as Hadoop, OpenStack and so on,offer promising hope those companies, and the companies can store huge amounts ofdata on inexpensive computer clusters, and be able to use the idea of MapReduceparallel processing of these massive data, as they save a lot of computing and storagecosts.As the existing Hadoop cluster contains various types of jobs, some jobs’completion time is not required, but others must finish in the given time to reducesome losses of the company, for which this article focuses on the various operatingcluster scheduling the execution flow, analyzed existing MapReduce schedulingtechnology, and the paper find that the current scheduling algorithms can not supportthe higher requirements for time-critical jobs, so the job scheduling algorithms havesome room to improvement, so we designd and implemented a double queue jobscheduler, and studied the slow node cluster determination method, besides, we studythe slow tasks which influence the jobs’finish time. The task will be setup a speculatetask in other node to speed up the task. The purpose of the improvement is to reducethe waste of resources in the cluster, and fufill the needs of the users.And finally, we set up a Hadoop cluster with our double queue scheduler toverify the performance of the improved scheduling algorithm, there are various typesof jobs in a cluster, the scheduler can perform better when the cluster has the deadlinejobs, and it can speed up the job’s finish time so that they can improve the utilizationof resources in the cluster to meet the needs of various users.

Keywords/Search Tags:

hadoop, job scheduling, cloud computing, specutive task

PDF Full Text Request

Related items

1	Research Of The Job Scheduling Algorithm On Cloud Platform Of Smart Grid
2	Task Scheduling In Cloud Computing Environment Based On NPSO Algorithm
3	Research On Cloud Task Scheduling Algorithms Based On Mapreduce
4	Research On Task Scheduling Algorithm For Generalized Forestry Geographic Information Processing In Cloud Computing
5	The Research On High Performance Task Scheduling Technology Based On Mapreduce In Cloud Computing
6	Study On Computing Task Scheduling Optimization Based On Hadoop Job
7	Research On Task Scheduling-based Intelligent Optimization Algorithms In Cloud Computing
8	Investigation Of Traffic Load And Task Scheduling In Secure Cloud Computing
9	Research On The Task Scheduling For The Cloud Computing Platform
10	Research On Task Scheduling Method Based On Mixed Tasks In Cloud Computing