Zone Division And Dynamic Load Scheduling Algorithm Based On Heterogeneous Spark Cluster

Posted on:2020-12-26

Degree:Master

Type:Thesis

Country:China

Candidate:X Zhu

Full Text:PDF

GTID:2428330596975459

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the development of big data technology,various big data processing frameworks have emerged.Spark is currently the most mainstream big data processing computing framework.Spark supports in-memory computing,provides interactive computing and querying,supports rich data manipulation,and provides services such as data mining,machine learning,and stream computing.The heterogeneity of a computer cluster means that the computers in the cluster have different hardware configurations,which makes the performance of these computers different in Spark jobs.The development of cloud computing and the use of data centers make clusters more heterogeneous.The rise of machine learning has made clusters with computers with mixed CPU and GPU architectures,making clusters heterogeneous.Cluster load balancing is the allocation of jobs or tasks to multiple computing units for execution,increasing throughput,and improving data processing and availability.For resource isolation and resource reuse,the runtime environment of the software architecture is becoming more and more complex,and Spark may run under complex and variable load environments.By analyzing the source code of Spark,two problems are found: Spark's resource allocation strategy based on the number of homogeneous processor cores cannot adapt to heterogeneous cluster environment;Spark's task scheduling lacks load-based task scheduling strategy.In order to optimize the above two points,this thesis proposes a Zone Division and Dynamic Load Scheduling Algorithm Based on Heterogeneous Spark Cluster.The algorithm consists of two parts,which are zone-based job scheduling and task scheduling based on dynamic load.Zone-based job scheduling includes dividing the cluster into zones and zone-based job resources allocation.A zone is a grouping of computers in the cluster that have the same number of processor cores and have similar performance in the benchmark.The zone-based job resources allocation refers to assigning computing resources from a specified zone or adjacent zone to a Spark job according to the user's configuration.Through zone-based job scheduling,the heterogeneity of the cluster is fully utilized to accelerate the running speed of the Spark job,and the user can allocate computing resources of different performances according to the priority requirements of the job.The dynamic load task scheduling means Spark performs task scheduling based on the load information collected periodically on each node of the cluster.This load scheduling method will enable Spark to avoid or reduce the use of high-load nodes and schedule more tasks to run on low-load nodes,thus speeding up the operation of Spark jobs.

Keywords/Search Tags:

Spark, heterogeneity, load balance

PDF Full Text Request

Related items

1	The Research On Task Schedule And Load Balance For Heterogeneous System
2	Research Of Dynamic Replication Access-load-balancing Strategy On Heterogenerity Structured P2P Network
3	Research Of Task Partition And Resource Allocation Algorithms For Load Balance In Spark Computing Environment
4	Research And Implementation Of Large Scale Image Database Feature Extraction Technology Based On Spark
5	P2p Systems Based On Ppadht Design
6	Research On Load Balance Technology Between TD-SCDMA And GSM Systems
7	Research On The Key Technology Of Routing Algorithms In Structured Peer To Peer Networks
8	The Parallelization And Optimization Of Fp-Growth Algorithm Based On Spark
9	Design And Implementation Fo Training System Based On Dynamic Load Balance Technology
10	Research On Load-Balance Based Key Technologies And Algorithms For Ad Hoc Networks