Font Size: a A A

Research On Scheduling Algroithm In Hadoop Mapreduce

Posted on:2015-06-27Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhaiFull Text:PDF
GTID:2308330482971057Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Cloud Computing is a popular topic of the commercial and scientific research, and Hadoop, as an open source implementation of the Google cloud platform, is an important research base for the researchers. In the Hadoop architecture, the MapReduce scheduling algorithm determines the sequence of jobs to implement and the computing resources it will have, therefore it is important to study and improve the Hadoop MapReduce scheduling algorithm, because it has a positive impact to improve the efficiency of Hadoop cloud platform.In Hadoop, because of the cluster problem, there are some tasks with low efficiency, and affect the completion time of the job, and become backward tasks.In the design of Hadoop MapReduce, it will launch a backup task for the task, and it will improve the overall efficiency of the system. However, both the Hadoop MapReduce original speculative execution method, and the other improvements backup tasks scheduling algorithm, can not effectively select the backup Reducer task.After analyzing some other MapReduce scheduling algorithms, this paper presents an improved backup scheduling algorithm for the Reducer task. Compared with the original algorithm, the improved algorithm has the following advantages:(1) The algorithm abandoned the way which pick out the slow task by the overall progress, it separate the whole process of some phases, and it compare a specified task rate to the average execution rate in the cluster in every phase to find out the task lags behind, it raises the accuracy of finding the backward task.(2) The improve algorithm use three queue to store the Reducer tasks of different stages, and assign the appropriate tasks to the node with the consideration of the node, with this way the algorithm could reduce the completion time of the backup job.(3) This algorithm considers the data locality of the task when it is selecting the backup task execution nodes, it looks a rack as a unit and calculating the Reducer task’s input data from each rack to deside which node will execute the task, therefore it can reduce the network workload. This algorithm could meet the accuracy requirements, data locality other requirements in the case of heterogeneous nodes, so the speculative tasks will be launched more accurate.We built a Hadoop cluster and tested the algorithm we proposed on it. During the experiment, we first determine the parameters required by the algorithm through a lot of experiments and analysis, and then we compare our improved algorithm with the Hadoop original algorithm and the LATE algorithm. Experimental results show that our algorithm reduces the time to complete the job, therefore improves the efficiency of the system.
Keywords/Search Tags:Cloud Computing, Hadoop, MapReduce, Scheduling Algorithm, Speculative Execution
PDF Full Text Request
Related items