Design Of Mapreduce Task Scheduling Algorithms In Heterogeneous Hadoop Cluster

Posted on:2019-06-24

Degree:Master

Type:Thesis

Country:China

Candidate:M Wang

Full Text:PDF

GTID:2428330545460437

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

The rapid development of Internet applications has led to the era of information explosion.Most of such data are nonrelational,in either unstructured or semi-structured format,and the data volume,even generated on a daily basis,has gone far beyond the storage capacity and processing capability of any conventional stand-alone computer.Distributed computing coupled with cloud computing has proved to be an effective solution.In the past decade,several distributed computing frameworks have been developed,and used for big data processing.Map Reduce is one of the most popular frameworks on the Hadoop platform with distributed storage and parallel processing.As the data volume continues to increase,improving the performance of Hadoop clusters has become the focus of many big data applications.There are many factors affecting the Hadoop performance and the Map Reduce scheduler is a key component that determines the overall performance of a Hadoop cluster.In this thesis,we formulate and investigate a task scheduling problem in a heterogeneous Hadoop cluster to minimize the completion time of a batch of Map Reduce jobs.We first design a prediction model to predict the end time of a task,which is used for placing the corresponding data block on a node in advance to reduce the data transmission time and the overall job completion time.Based on this prediction model,we propose a heuristic Map Reduce task matching-based scheduling algorithm,referred to as TMSA,to schedule the tasks in the task queue in Hadoop,by taking into account the real-time performance of each node in the cluster and the matching degree between nodes and tasks to reduce job completion time on heterogeneous clusters.The contents of this thesis are summarized as follows:(1)This thesis formulates a task scheduling problem of Map Reduce,and proves it to be NP-complete.(2)To solve the data locality problem of map tasks,this thesis proposes a map task execution time prediction model to predict the end time of a task.The prediction model decides the node with the earliest completion time and is used by the scheduling algorithm to find the best-matched task for the node.With the predicted task execution time,we are able to transfer the task data to the execution node in advance to avoid runtime data transmission and hence reduce the overall job completion time.(3)To solve the MapReduce task scheduling problem,we propose a heuristic MapReduce task matching-based scheduling algorithm,referred to as TMSA,to schedule the tasks in the task queue in Hadoop,by taking into account the real-time performance of each node in the cluster and the matching degree between nodes and tasks.(4)The map task execution time prediction model and TMSA proposed in this thesis are tested and evaluated in small-scale real-life Hadoop clusters and a Cloud Sim-based simulation platform,respectively.For different test procedures,the experimental results show that compared with FIFO and DPMQS,TMSA reduces the job completion time by 28.3%,29.2% and 9.5%,13.7%,respectively.For the map task execution time prediction model,the accuracy of the prediction model is over 90% in the case of 5000 samples.

Keywords/Search Tags:

Distributed Computing Framework, Hadoop, MapReduce, Yarn, Scheduler

PDF Full Text Request

Related items

1	Research And Implementation Of Highresponsive Hadoop Computing Resource Scheduler Based On YARN
2	Based On The Research Of Parallel Computing Framework Of YARN
3	Research And Optimization Of YARN-Based Hybrid Structure Scheduler
4	Based On Improved Hadoop Yarn Scheduler Design And Implementation Of Large Data Supporting Platform
5	Design And Implementation Of Hadoop Resource-aware Scheduler
6	The Design Of The Cloud Computing System Based On Hadoop
7	Research On The Energy-aware Scheduler For Hadoop
8	Research And Improvement Of Job Scheduling Algorithm Based On Hadoop
9	The Design And Implmentation Of Health Information System Framework Based On Hadoop
10	Design And Implementation Of YARN Resource Scheduling Strategy Optimization Method