Data Locality-awared Task Scheduler For Hadoop

Posted on:2016-05-06

Degree:Master

Type:Thesis

Country:China

Candidate:L Zeng

Full Text:PDF

GTID:2308330470476914

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Task scheduling is a core function of distributed parallel computing platforms, such as Hadoop, Dryad and so on. The result of task scheduling greatly affect the system throughput, the resource utilization of the computing cluster and the performance of the job. Meanwhile, the heterogeneity of computing clusters, variability of workload and diversity of task characteristics make the task scheduling becoming one of the most difficult problems of distributed clusters. In this paper, a lot of the scheduling algorithms of distributed parallel computing platforms has been studied. Based on indepth research of the strengths and weaknesses of the current system in Hadoop task scheduling algorithms, especially for the resource redistribution of MapReduce, improvements and optimizations have been proposed.1) The shortcomings of queue-based task scheduling schema is that it does not made the best of the data locality. The problem of scheduling Map tasks in current Hadoop system is remapped to a flow network and transformed into a maximum flow problem. As a result, we obtain a better scheduling schema by solving the maximum flow problem, which save a lot of network overhead.2) In order to further reduce the network overhead. This paper improves the algorithm on the basis of 1), which considers the degree of the preference mapping a task to a node.3) The bandwidth of the core router often limits the overall data transfer rate. In order to saving the scarce bandwidth resource of the core router. We reduce the amount of network data transmission across racks in Shuffle stages by improving the Reduce task scheduling.Finally, an experiment has been conducted in a real word Hadoop environment to validate the effective and efficient of the scheduling model and algorithm. Experimental results show that the algorithm has outstanding performance in reducing the amount of network data transmission of MapReduce applications.

Keywords/Search Tags:

big data, Hadoop, data locality, task scheduling

PDF Full Text Request

Related items

1	Research On Data Locality Of Hadoop Task Scheduling
2	Research And Improvement Of Task Scheduling Algorithm In Hadoop
3	Research On Scheduling Strategy Based On Hadoop
4	The Research On Distributed Task Scheduling Algorithms Based On Hadoop Platform
5	Task Scheduling Research And Application Of Big Data In Distributed Environment
6	Hadoop Task Scheduling Algorithm Optimization About Data Locality
7	Hadoop Job Scheduling Research And Optimization About Data Locality
8	The Research On High Performance Task Scheduling Technology Based On Mapreduce In Cloud Computing
9	Research On Cloud Task Scheduling Algorithms Based On Mapreduce
10	Optimization And Research On Reduce Task Scheduling Strategy And Data Skew On Hadoop