The Research Of Load Comprehensive Evaluation And Dynamic Resource Scheduling For Spark Cluster

Posted on:2022-05-12

Degree:Master

Type:Thesis

Country:China

Candidate:B Liu

Full Text:PDF

GTID:2568306488478764

Subject:Safety science and engineering

Abstract/Summary:

PDF Full Text Request

With the explosive growth of data in various industries and the increase of business demands,data storage and computing technologies are facing new challenges,and various big data technologies are developing rapidly.Spark is a distributed big data computing framework based on in-memory computing.It is widely used by all walks of life due to its rich computing functions and reliable performance such as machine learning,graph computing and stream computing,and it is also accompanied by problems.The hardware devices that provide the basic services of big data technology will be updated,iterated and expanded over time,making each node in the Spark cluster have different hardware configurations,resulting in cluster heterogeneity.The performance of computing nodes is different,and Spark jobs will have different performances on different nodes.By analyzing the source code of Spark,Spark’s resource scheduling strategy based on a homogeneous structure will cause uneven resource allocation in heterogeneous clusters,which will affect the load and then affect the efficiency of job execution.Therefore,based on Spark’s default resource manager,the resource scheduling of the cluster is improved by considering the impact of heterogeneous computing nodes.The default resource scheduling strategy of Spark only considers the number of remaining CPU cores of computing nodes during resource scheduling.Faced with the problem of insufficient consideration of heterogeneous clusters.This paper first proposes a method for comprehensive evaluation of node load.The load evaluation index of heterogeneous nodes is established by comprehensively considering the static performance of nodes and the dynamic load information at runtime.Then the weight of evaluation index is determined by the cluster analytic hierarchy process(AHP),and a quantitative model of the real-time load of the computing node is obtained.Then,adding feedback mechanism to the Spark’s original resource manager.The order of the computing nodes of Spark during resource allocation is adjusted regularly base on the size of the real-time load quantified by the load comprehensive evaluation method,and realize the dynamic resource scheduling strategy based on the comprehensive evaluation of node load.Finally,through comparative experiments on the deployed Spark platform,it not only proves that the strategy effectively alleviates the load balancing problem and improves the execution efficiency of the cluster,but also proves that the strategy has good scalability.

Keywords/Search Tags:

Spark, Heterogeneous Cluster, Load Balancing, Resource Scheduling, Analytic Hierarchy Process, Cluster Analysis

PDF Full Text Request

Related items

1	Study Of Load Balancing Algorithm Based On Process Migration Mechanism In Heterogeneous Cluster Environment
2	Study On Load Balancing Method In Heterogeneous Wireless Networks
3	Research And Optimization For High Concurrency Cluster Load Balancing Strategy Based On Nginx
4	Research And Implementation Of LVS Cluster Load Balancing Scheduling Algorithm Based On PSO-GA
5	Research And Realization Of Load Balancing Algorithm Based On Heterogeneous Cluster Dynamic Feedback
6	Algorithm Based On Nginx Research On Dynamic Load Balancing Strategy
7	Zone Division And Dynamic Load Scheduling Algorithm Based On Heterogeneous Spark Cluster
8	Based On Feedback Scheduling Algorithms For Dynamic Load Balancing In The Heterogeneous Environment Of Hadoop Design And Implementation
9	The Application Of Linux Cluster System Based On The Load Balancing Algorithm In Webgis
10	Research On Load Balancing Technology In Cluster System