Research On Task Scheduling Algorithm Under MapReduce Framework

Posted on:2018-10-11

Degree:Master

Type:Thesis

Country:China

Candidate:J J Ding

Full Text:PDF

GTID:2358330512476801

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Recently,big data computing has become a research hotspot.,Hadoop and Spark clusters are both based on MapReduce and most commonly used big-data clusters processing frameworks.Resource scheduling is an important factor that affects the performance of large-scale data processing of distributed cluster framework.Therefore,the research of task scheduling algorithm in MapReduce based Hadoop and Spark environment has important theoretical value and practical significance.This paper explores the batch scheduling algorithm under Hadoop environment and the resource scheduling method when Spark is taken as a service.In order to optimize the maximum completion time of the batch scheduling problem under Hadoop environment,the model of this problem is transformed into two stage hybrid flow task scheduling problem with setup time in this paper.And heuristic algorithms,DAGEA(Directed Acyclic Graph Earliest Available)and DAGEF(Directed Acyclic Graph Earliest Finish),which are based on DAG(Directed Acyclic Graph)model are proposed.Existing solution algorithms are often based on Gantt chart structure.These methods can't effectively consider the scheduling scope of each job.Different from this,DAGEA and DAG are based on DAGEF structure.They calculate the scheduling range of each job by DAG and adjust the start time of the operation,so as to effectively improve the performance and efficiency of the algorithm.Simulation experiments verify this conclusion.The computing of Spark is based on memories,while this processing of Hadoop is based on the disk.Existing Spark resource scheduling methods just take the number of spare cores and memory requirements into consideration.In this paper,we add the cluster node utilization and the processing ability of each node into consideration,re-evaluate the resource utilization of each node,and allocate resources to the tasks.The new scheduling algorithm MEAN,reduces the granularity of resource partitioning.Therefore it can improve the resource utilization,increase the number of online Web requests,and improve the concurrency.Task scheduling and resource allocation is the key of big data computing platform,and its quality directly decides the performance of the platform.The MapReduce scheduling algorithms in this paper focus on the batch processing scheduling algorithm under Hadoop environment and the resource allocation method under the environment of Spark.Algorithms like DAGEA,DAGEF and MEAN are proposed and proved to be effective by experiments.

Keywords/Search Tags:

MapReduce, Hadoop, Schedule-dependent setup time, Spark, resource used rate

PDF Full Text Request

Related items

1	The Reaserch And Optimize Of A New Hadoop Job Scheduling Algorithm Based On MLFQ
2	Research On The Performance And Optimization Of MapReduce Model In Hadoop Platform
3	The Research And Optimization Of Job Schedule Algorithm In Hadoop
4	Research On Hybrid Flow Shop Scheduling Problem Considering Learning Effect And Sequence-Dependent Setup Time
5	The Research Of MapReduce Job Scheduling Algorithm Based On The Hadoop Platform
6	Research And Implementation Of Hadoop Platform Performance Optimization
7	MDE-Based Approach For Mapreduce Bigdata Transformation Software Development
8	Scheduling flexible flow lines with sequence dependent setup times
9	Research On The Implementation Of Bursty Events Detection Based On Spark
10	Research And Improvement Of Resource Scheduler Algorithm Based On Hadoop