Scheduling Spark Tasks To Heterogeneous Cluster

Posted on:2020-07-15

Degree:Master

Type:Thesis

Country:China

Candidate:S Fan

Full Text:PDF

GTID:2428330590960012

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Spark as a highly efficient distributed computing framework based on DAG is widely used in complex big data processing such as e-commerce,Internet of Things,and data analysis and its task scheduling is an important factor influencing the performance of big data analysis.With the expansion of applications and the rapid growth of data volume,it is not practical to rely on a single data center to store and process massive data.In addition,with the introduction of high-performance machines within the data center,the original data center is no longer made up of homogeneous machines.Therefore,it is of practical significance to study task scheduling in heterogeneous Spark clusters.In this paper,we consider this kind of scheduling problems to minimizing the maximum completion time of all involved tasks about which great challenges are brought by dynamic available times of cores,embedded precedence constraints and heterogeneity of nodes.Then,we will establish a mathematical model according to the embedded precedence constraints between job set and stage set,independent task set.Finally,we proposed Spark task scheduling algorithm(STSA)to solve this kind of scheduling problems.The algorithm is mainly divided into four parts: Temporal Parameters Estimation,Dynamic Job Sequence Adjusting,DAG-based Stage Scheduler,Task Scheduling with the Earliest Finish time.The temporal parameters of both stages and jobs will be recursively estimated by the forward and backward calculations.According to the temporary parameters,we propose three job sorting rules to dynamically adjust the scheduling sequence of the job.In the stage scheduling process,we will propose two stage weight setting rules that based on stage weights,equitable allocation of computing resources for each stage.Max-min double-tier heaps are constructed for scheduling tasks to appropriate cores of executors in heterogenous nodes and we will use insert variable neighborhood search to further optimize makespan.In order to verify the efficiency and effectiveness of the proposed algorithm,we use multivariate analysis of variance to analyse parameters to get the best appropriate values of the considered problem in this paper.We compared our algorithm with the existing algorithm in different application scales and data centers.Experimental results show that the proposed algorithm is better than existing FIFO and FAIR algorithms.

Keywords/Search Tags:

Spark, DAG, Task Scheduling, STSA, Heterogenous nodes

PDF Full Text Request

Related items

1	Research Of Task Scheduling Strategy For Heterogeneous Cluster In Spark Computing Environment
2	The Elastic Resource Allocation And Task Scheduling Of Spark
3	Research On Spark Task Scheduling Technology Based On Execution Time Prediction
4	Research And Application Of Energy Efficiency Model And Task Scheduling Based On Heterogeneous Spark Cluster
5	The Research On Spark Task Scheduling Strategy Based On Dynamic Memory Awareness
6	Research Of Task Partition And Resource Allocation Algorithms For Load Balance In Spark Computing Environment
7	Capacity Detection Of Nodes And Task Scheduling Method In MapReduce
8	Research On Task Scheduling Algorithms Based On Q-Learning For Sensor Nodes
9	Task scheduling in supercapacitor based environmentally powered wireless sensor nodes
10	Research On Spark Performance Optimization Technology For In-Memory Computing