Font Size: a A A

Research On Hadoop Platform And Its Job Scheduling Algorithm

Posted on:2015-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y L ZhangFull Text:PDF
GTID:2298330434452328Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer technology and high-speed expansionof Internet using. As a hot distributed computing mode now, cloud computing haschanged the traditional service mode of Internet. The cluster that made up of lots ofcheap common machines can provide users with computing power, storage capacity,and information services. Cloud computing has technically solved the current Internetissues such as mass data processing, massively parallel computing and multiple users.It has brought revolutionary changes in technology to the whole Internet.Hadoop is a widely used open source cloud platform with high reliability, goodscalability, and low cost advantages. Job scheduling is an important factor affectingthe overall performance and resource utilization of Hadoop clusters. It is not onlyrelated to the orderly operation of the cluster, but also the key that whether the entireclusters can effectively utilize system resources to meet users’ needs. However, as thechanges of the application environment, it is difficult for the existing job schedulingalgorithms to meet the various needs of users. So the research and improvement of theexisting job scheduling algorithm is of great significance.Through analysing of the existing scheduling algorithm of Hadoop, this thesishas presented some improved strategies. To overcome defects in CapacityScheduling algorithm, a priority based weighted Capacity Scheduling algorithm ispresented. The algorithm calculates the weighting of jobs according to its priority,waiting time and other factors, sorts the job queue based on the job’s weight. Suchmethod not only can avoid losing into local optimal solution, but also can meet jobrequirements better. By considering the conflicts between job response time and taskdata localization in delay scheduler synthetically, the new scheduling algorithmmakes a further request to whether the job is delayed based on delay scheduling,trying to improve the response time of some jobs at the cost of losing some datalocalization.The algorithm was tested to verify its performance by building a Hadoopplatform. The results suggested that the weighted Capacity Scheduling algorithmbased on priority improved cluster efficiency and could shorten response time ofsome small jobs. Experiment results showed that although task data localization ofthe modified delay scheduling algorithm was lower than that of delay schedulingalgorithm, but it shortened job response time and its performance was better than fair scheduling algorithm, which reflected the effect of the improved algorithm’s idea.
Keywords/Search Tags:cloud computing, Hadoop, job scheduling, Mapreduce programmingmodel
PDF Full Text Request
Related items