The Research And Optimization Of Job Schedule Algorithm In Hadoop

Posted on:2016-04-22

Degree:Master

Type:Thesis

Country:China

Candidate:C J Tao

Full Text:PDF

GTID:2308330470957790

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

With the fully development of cloud computing technology, it provides a big data solution for the enterprises. Hadoop is an open source distributed cloud computing platform under the Apache organization. And Hadoop is widely used in many companies to handle big data due to its high reliability, high scalability and high fault tolerance. Hadoop MapReduce is a distributed data processing, and it is a core component of Hadoop. Its job scheduling algorithm determines the performance of MapReduce and affects the system performance of Hadoop. The Reduce task scheduling algorithm of existing Hadoop job scheduling algorithms is too simple, thus it restricts the Hadoop system performance. On one hand, small jobs’reduce task suffers starvation and the Hadoop system has a low resource utilization, on the other hand, Hadoop does not take reduce task’s data locality into consideration.This thesis focuses on Hadoop job scheduling algorithm and proposes optimized algorithms of Hadoop Reduce task scheduling. The main work can be described as following:1) The thesis analysis the starvation of smaller jobs’Reduce tasks and low resource utilization of Hadoop system, and proposes a task time estimation model. And it introduces an improved algorithm SBOTM based on the time model, it embedded SBOTM algorithm into current popular fair scheduler. The algorithm can effectively decrease the starvation of smaller jobs Reduce tasks, and also improve resource utilization by comparing with the native fair scheduler.2) This thesis in-depth analysis the local data issues of Reduce tasks, and proposes a delay scheduling algorithm DSORT. The strategy of scheduling delays is applied in the optimization of Reduce tasks’data locality, and also it embedded the realization of DSORT into fair scheduler. The algorithm greatly improves the data locality of Reduce tasks, and it reduces transport network cost and shortens the execution time of the job by comparison with native fair scheduler.The proposed algorithm can effectively improve Hadoop with Reduce task scheduling algorithm and improve the efficiency of job execution. Moreover it optimizes the data locality of Reduce tasks and reduces transport network cost. And also the algorithm has good scalability and can be ported to other schedulers.

Keywords/Search Tags:

Hadoop, job schedule, Reduce task, resource utilization, data locality

PDF Full Text Request

Related items

1	The Research On Distributed Task Scheduling Algorithms Based On Hadoop Platform
2	Optimization And Research On Reduce Task Scheduling Strategy And Data Skew On Hadoop
3	Data Locality-awared Task Scheduler For Hadoop
4	Research On Data Locality Of Hadoop Task Scheduling
5	Hadoop Task Scheduling Algorithm Optimization About Data Locality
6	Research And Improvement Of Task Scheduling Algorithm In Hadoop
7	Research Of Hadoop Job Scheduler Algorithm Based On Task Characteristics And Fair Strategy
8	Research On Scheduling Strategy Based On Hadoop
9	Research On Task Scheduling Algorithms Based On Pre-Release Resource List In Hadoop
10	Design And Implementation Of Image Retrieval System Based On Hadoop