Font Size: a A A

Task Scheduling And Shuffle Scheduling For MapReduce Jobs

Posted on:2020-03-11Degree:MasterType:Thesis
Country:ChinaCandidate:W L LiuFull Text:PDF
GTID:2428330575496978Subject:Software engineering
Abstract/Summary:PDF Full Text Request
MapReduce framework,which processes large amount of data at a lower cost and higher efficiency by splitting the job into multiple tasks and handing them over to multiple nodes,is a popular data parallel processing framework in the data center.MapReduce splits a job into multiple map tasks and reduce tasks.A MapReduce job consists of three phases: Map,Shuffle,and Reduce.Before running map tasks and reduce tasks,the task nodes communicate with the data nodes to fetch the data.Therefore,careful map tasks and reduce tasks deployment can reduce network traffic and improve MapReduce performance.The data transmission phase between map tasks and reduce tasks is called shuffle.Shuffle accounts for a large part of the job running time on average.Effective shuffle data scheduling can reduce the makespan of shuffle and hence improve the MapReduce performance.This thesis studies the task scheduling problem and the shuffle data transmission scheduling problem for Mapreduce jobs.The main research is as follows:(1)The task scheduling problem in MapReduce.How to determine the task scheduling location to minimize the network traffic is critical for the performance of the algorithm.Most of the current task scheduling algorithms only perform either scheduling for map tasks or reduce tasks without the joint consideration of the impact of both map tasks scheduling and reduce tasks scheduling on network traffic.We propose a data Replica location-Aware Joint map and reduce Scheduling algorithm(RAJS).The algorithm determines the scheduling location of the map tasks and the reduce tasks according to the node processing capabilities and the data replica lications of the mapping task input data.Experiments show that RAJS can effectively reduce the data traffic in the network.(2)Shuffle data transmission scheduling problem in MapReduce.There are always some periodic jobs in the data center network.Therefore,the network is periodically busy and idle.Existing research ignores the impact of periodic network status on shuffle data transmission scheduling.We propose a MapReduce Shuffle Scheduling algorithm(MSS)for MapReduce jobs based on periodic network status.The proposed algorithm MSS is a 3 / 2-approximation algorithm to the shuffle scheduling problem when all the future idle time slots have the same duration.Experimental results demonstrate the proposed algorithm can effectively reduce the makespan of the MapReduce shuffle phase,and increase the network utilization.
Keywords/Search Tags:MapReduce, task schedule, network traffic, shuffle schedule, makespan
PDF Full Text Request
Related items