Font Size: a A A

Data Locality-awared Task Scheduler For Hadoop

Posted on:2016-05-06Degree:MasterType:Thesis
Country:ChinaCandidate:L ZengFull Text:PDF
GTID:2308330470476914Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Task scheduling is a core function of distributed parallel computing platforms, such as Hadoop, Dryad and so on. The result of task scheduling greatly affect the system throughput, the resource utilization of the computing cluster and the performance of the job. Meanwhile, the heterogeneity of computing clusters, variability of workload and diversity of task characteristics make the task scheduling becoming one of the most difficult problems of distributed clusters. In this paper, a lot of the scheduling algorithms of distributed parallel computing platforms has been studied. Based on indepth research of the strengths and weaknesses of the current system in Hadoop task scheduling algorithms, especially for the resource redistribution of MapReduce, improvements and optimizations have been proposed.1) The shortcomings of queue-based task scheduling schema is that it does not made the best of the data locality. The problem of scheduling Map tasks in current Hadoop system is remapped to a flow network and transformed into a maximum flow problem. As a result, we obtain a better scheduling schema by solving the maximum flow problem, which save a lot of network overhead.2) In order to further reduce the network overhead. This paper improves the algorithm on the basis of 1), which considers the degree of the preference mapping a task to a node.3) The bandwidth of the core router often limits the overall data transfer rate. In order to saving the scarce bandwidth resource of the core router. We reduce the amount of network data transmission across racks in Shuffle stages by improving the Reduce task scheduling.Finally, an experiment has been conducted in a real word Hadoop environment to validate the effective and efficient of the scheduling model and algorithm. Experimental results show that the algorithm has outstanding performance in reducing the amount of network data transmission of MapReduce applications.
Keywords/Search Tags:big data, Hadoop, data locality, task scheduling
PDF Full Text Request
Related items