Font Size: a A A

The Research And Optimization Of Job Schedule Algorithm In Hadoop

Posted on:2016-04-22Degree:MasterType:Thesis
Country:ChinaCandidate:C J TaoFull Text:PDF
GTID:2308330470957790Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the fully development of cloud computing technology, it provides a big data solution for the enterprises. Hadoop is an open source distributed cloud computing platform under the Apache organization. And Hadoop is widely used in many companies to handle big data due to its high reliability, high scalability and high fault tolerance. Hadoop MapReduce is a distributed data processing, and it is a core component of Hadoop. Its job scheduling algorithm determines the performance of MapReduce and affects the system performance of Hadoop. The Reduce task scheduling algorithm of existing Hadoop job scheduling algorithms is too simple, thus it restricts the Hadoop system performance. On one hand, small jobs’reduce task suffers starvation and the Hadoop system has a low resource utilization, on the other hand, Hadoop does not take reduce task’s data locality into consideration.This thesis focuses on Hadoop job scheduling algorithm and proposes optimized algorithms of Hadoop Reduce task scheduling. The main work can be described as following:1) The thesis analysis the starvation of smaller jobs’Reduce tasks and low resource utilization of Hadoop system, and proposes a task time estimation model. And it introduces an improved algorithm SBOTM based on the time model, it embedded SBOTM algorithm into current popular fair scheduler. The algorithm can effectively decrease the starvation of smaller jobs Reduce tasks, and also improve resource utilization by comparing with the native fair scheduler.2) This thesis in-depth analysis the local data issues of Reduce tasks, and proposes a delay scheduling algorithm DSORT. The strategy of scheduling delays is applied in the optimization of Reduce tasks’data locality, and also it embedded the realization of DSORT into fair scheduler. The algorithm greatly improves the data locality of Reduce tasks, and it reduces transport network cost and shortens the execution time of the job by comparison with native fair scheduler.The proposed algorithm can effectively improve Hadoop with Reduce task scheduling algorithm and improve the efficiency of job execution. Moreover it optimizes the data locality of Reduce tasks and reduces transport network cost. And also the algorithm has good scalability and can be ported to other schedulers.
Keywords/Search Tags:Hadoop, job schedule, Reduce task, resource utilization, data locality
PDF Full Text Request
Related items