Font Size: a A A

Research On Task Scheduling Algorithm Under MapReduce Framework

Posted on:2018-10-11Degree:MasterType:Thesis
Country:ChinaCandidate:J J DingFull Text:PDF
GTID:2358330512476801Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Recently,big data computing has become a research hotspot.,Hadoop and Spark clusters are both based on MapReduce and most commonly used big-data clusters processing frameworks.Resource scheduling is an important factor that affects the performance of large-scale data processing of distributed cluster framework.Therefore,the research of task scheduling algorithm in MapReduce based Hadoop and Spark environment has important theoretical value and practical significance.This paper explores the batch scheduling algorithm under Hadoop environment and the resource scheduling method when Spark is taken as a service.In order to optimize the maximum completion time of the batch scheduling problem under Hadoop environment,the model of this problem is transformed into two stage hybrid flow task scheduling problem with setup time in this paper.And heuristic algorithms,DAGEA(Directed Acyclic Graph Earliest Available)and DAGEF(Directed Acyclic Graph Earliest Finish),which are based on DAG(Directed Acyclic Graph)model are proposed.Existing solution algorithms are often based on Gantt chart structure.These methods can't effectively consider the scheduling scope of each job.Different from this,DAGEA and DAG are based on DAGEF structure.They calculate the scheduling range of each job by DAG and adjust the start time of the operation,so as to effectively improve the performance and efficiency of the algorithm.Simulation experiments verify this conclusion.The computing of Spark is based on memories,while this processing of Hadoop is based on the disk.Existing Spark resource scheduling methods just take the number of spare cores and memory requirements into consideration.In this paper,we add the cluster node utilization and the processing ability of each node into consideration,re-evaluate the resource utilization of each node,and allocate resources to the tasks.The new scheduling algorithm MEAN,reduces the granularity of resource partitioning.Therefore it can improve the resource utilization,increase the number of online Web requests,and improve the concurrency.Task scheduling and resource allocation is the key of big data computing platform,and its quality directly decides the performance of the platform.The MapReduce scheduling algorithms in this paper focus on the batch processing scheduling algorithm under Hadoop environment and the resource allocation method under the environment of Spark.Algorithms like DAGEA,DAGEF and MEAN are proposed and proved to be effective by experiments.
Keywords/Search Tags:MapReduce, Hadoop, Schedule-dependent setup time, Spark, resource used rate
PDF Full Text Request
Related items