Font Size: a A A

Scheduling Spark Tasks To Heterogeneous Cluster

Posted on:2020-07-15Degree:MasterType:Thesis
Country:ChinaCandidate:S FanFull Text:PDF
GTID:2428330590960012Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Spark as a highly efficient distributed computing framework based on DAG is widely used in complex big data processing such as e-commerce,Internet of Things,and data analysis and its task scheduling is an important factor influencing the performance of big data analysis.With the expansion of applications and the rapid growth of data volume,it is not practical to rely on a single data center to store and process massive data.In addition,with the introduction of high-performance machines within the data center,the original data center is no longer made up of homogeneous machines.Therefore,it is of practical significance to study task scheduling in heterogeneous Spark clusters.In this paper,we consider this kind of scheduling problems to minimizing the maximum completion time of all involved tasks about which great challenges are brought by dynamic available times of cores,embedded precedence constraints and heterogeneity of nodes.Then,we will establish a mathematical model according to the embedded precedence constraints between job set and stage set,independent task set.Finally,we proposed Spark task scheduling algorithm(STSA)to solve this kind of scheduling problems.The algorithm is mainly divided into four parts: Temporal Parameters Estimation,Dynamic Job Sequence Adjusting,DAG-based Stage Scheduler,Task Scheduling with the Earliest Finish time.The temporal parameters of both stages and jobs will be recursively estimated by the forward and backward calculations.According to the temporary parameters,we propose three job sorting rules to dynamically adjust the scheduling sequence of the job.In the stage scheduling process,we will propose two stage weight setting rules that based on stage weights,equitable allocation of computing resources for each stage.Max-min double-tier heaps are constructed for scheduling tasks to appropriate cores of executors in heterogenous nodes and we will use insert variable neighborhood search to further optimize makespan.In order to verify the efficiency and effectiveness of the proposed algorithm,we use multivariate analysis of variance to analyse parameters to get the best appropriate values of the considered problem in this paper.We compared our algorithm with the existing algorithm in different application scales and data centers.Experimental results show that the proposed algorithm is better than existing FIFO and FAIR algorithms.
Keywords/Search Tags:Spark, DAG, Task Scheduling, STSA, Heterogenous nodes
PDF Full Text Request
Related items