Font Size: a A A

Mapreduce Job Scheduling For Heterogeneous Geo-distributed Clusters

Posted on:2020-02-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:J WangFull Text:PDF
GTID:1488306557485124Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The main problem of MapReduce job/workflow scheduling is to assign tasks to server-s reasonably in heterogeneous geo-distributed clusters.Because of the heterogeneous,geo-distributed and limit servers in cloud center,the random distribution of data and the hetero-geneous resource requirements of MapReduce job/workflow,many challenges are brought to MapReduce job/workflow scheduling with deadlines,data locality and resource utilization.In this paper,the MapReduce job/workflow scheduling in heterogeneous geo-distributed clusters with minimizing makespan,with minimizing energy consumption and with maximizing ben-efit cost are considered respectively.The main contributions of this paper are summarized as follows:1?A time-aware MapReduce job scheduling in heterogeneous geo-distributed clusters is considered to minimize makespan with deadlines,data locality and adaptive heartbeat interval.Map tasks of jobs are processed in parallel in different clusters to decrease data transmission times.Reduce tasks of jobs are processed in one cluster with the minimal estimated completion times of jobs according to shuffle times of intermediate data and processing times of reduce tasks.In terms of the maximum data volume of tasks,deadline of MapReduce jobs are divided into deadline of map tasks and that of reduce tasks.The MapReduce job scheduling is formulated as an Assignment Problem,in which adaptive heartbeats are calculated by processing times of tasks.In each heartbeat,jobs are sequenced in terms of the divided deadlines and tasks are scheduled by the Hungarian algorithm to decrease completion times of jobs.Experimental results show that the proposed algorithms outperform the existing works.2?An energy-aware MapReduce job scheduling in heterogeneous geo-distributed clusters is considered to minimize energy consumption with deadlines and data locality.The MapReduce job scheduling is modeled and a dynamic MapReduce job scheduling framework is proposed.Jobs are sequenced according to deadline constraints,allocated number of job slots and possible processing times of jobs.Tasks are scheduled to promising slots from their rack-local servers,cluster-local servers and remote servers in order to improve data locality.An update of available slots in clusters is proposed not only to find available slots but also to improve server resource utilization using fuzzy logic with the available number of slots according to current CPU,memory and bandwidth utilization.Experimental results show that the proposed heuristic results in lower energy consumption than adopted algorithms from literatures with a variable total number of slots.3?A benefit-aware MapReduce workflow scheduling in heterogeneous cloud center is considered to maximize benefit cost of resource managers with deadlines and data locality.The MapReduce workflow scheduling is modeled and a modified architecture of workflow scheduling is designed.Meanwhile a workflow scheduling framework consisting of workflow conversion,deadline division,task list construction and task scheduling is proposed.A number of MapReduce workflows are converted by Dynamic Programming according to Chain-Map/Chain Reduce in order to decrease transmission times among jobs reasonably.In terms of execution time,float time[1]and job level,deadlines of these converted workflows are divided into subdeadlines of jobs in workflows.Four different task list constructions are designed according to the sequences of workflows,MapReduce jobs and tasks.In order to improve data locality,replica strategy is adapted in MapReduce workflow and tasks are scheduled to servers with the earliest finish time.Experimental results show that the proposed heuristic results in more total benefit than other adopted algorithms.
Keywords/Search Tags:MapReduce job, Heterogeneous environment, Deadline, Data locality, Resource utilization
PDF Full Text Request
Related items