Research On Energy Consumption Model And Energy Saving Scheduling Algorithm For Spark

Posted on:2019-07-24

Degree:Master

Type:Thesis

Country:China

Candidate:H C Wang

Full Text:PDF

GTID:2428330590465749

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The energy consumption is explosive increasing with the growth of the demand for big data computing.High Carbon emissions from big data platforms have serious impacts on the environment.Big data platform is widely used and provides fast computing services in various fields.Therefore,it is great significance to study how to reduce the energy consumption from big data platform.The MapReduce provides an effective way to compute large-scale data in a distributed computing manner.However,the lack of the performance and real-time processing capability in the MapReduce leads to the birth of the Apache Spark.Spark,as an efficient and open source big data framework,has become the first choice for most enterprises and data centers.Spark provides two scheduling strategies,FIFO and FAIR.The Spark native scheduling strategy does not consider the energy consumption in the task scheduling,so there is a lot of space to reduce the energy consumption.Face to the existing energy consumption problems in Spark scheduling,the following work has been done:1.Energy efficiency relationship strategy table is proposed by this thesis to record the task's running time and energy consumption.The energy efficiency relationship strategy table plays a key role in the task scheduling process.2.According to the way of submitting job,dividing stage and distributing task in Spark,this thesis constructs the stage's energy consumption model,job's energy consumption model,and application's energy consumption model,and proposes the objective function of the Spark energy scheduling problem.3.Based on the energy efficiency strategy table and the Spark energy consumption model,this thesis proposes two energy-aware Spark scheduling algorithms,A-type and Btype.The A-type algorithm greedily chooses the low energy consumption node to assign task with satisfying the data locality.In order to further optimize the problem of long running time of the A-type algorithm in the case of less task,the B-type algorithm reduces the execution time by balancing the tasks on the nodes.There are applicable scenarios for each scheduling algorithms respectively.To sum up,the energy-aware Spark scheduling strategy proposed by this study have the characteristics of energy perception and dynamic scheduling.This thesis selects four workloads from the Hibench benchmark and performed a variety of experiments.Experiments show that the two Spark energy-saving scheduling algorithms can effectively reduce the energy consumption of Spark applications.Compared with the native scheduling strategy,the A-type algorithm can reduce the energy consumption by 22% to 34% on average,and the B-type algorithm can reduce the energy consumption by 20% to 31% on average.

Keywords/Search Tags:

Apache Spark, Big Data, Energy-Aware, Scheduling Strategy

PDF Full Text Request

Related items

1	OCTWAS - Online Check-pointer for Workflows on Apache Spark
2	Rack aware scheduling in HPC data centers: An energy conservation strategy
3	Research On Taxi Trajectory Organization Method Based On Apache Spark
4	Research And Implementation Of Energy Efficiency Scheduling Based On DVFS In Spark On YARN
5	Research On Association Mining Optimization Based On Spark Distributed And Application Of Comprehensive Decision
6	Using apache spark for scalable gene sequence analysis
7	Research On Cache Mechanism And Job Scheduling Policy In Spark
8	Design And Implementation Of A Performance Modeling System On Apache Spark
9	Research On Energy-Aware Resource Scheduling Mechanism In Data Center Networks
10	Research On The Discretization Algorithm Of Big Data Based On Spark