Font Size: a A A

Research On Energy Consumption Model And Energy Saving Scheduling Algorithm For Spark

Posted on:2019-07-24Degree:MasterType:Thesis
Country:ChinaCandidate:H C WangFull Text:PDF
GTID:2428330590465749Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The energy consumption is explosive increasing with the growth of the demand for big data computing.High Carbon emissions from big data platforms have serious impacts on the environment.Big data platform is widely used and provides fast computing services in various fields.Therefore,it is great significance to study how to reduce the energy consumption from big data platform.The MapReduce provides an effective way to compute large-scale data in a distributed computing manner.However,the lack of the performance and real-time processing capability in the MapReduce leads to the birth of the Apache Spark.Spark,as an efficient and open source big data framework,has become the first choice for most enterprises and data centers.Spark provides two scheduling strategies,FIFO and FAIR.The Spark native scheduling strategy does not consider the energy consumption in the task scheduling,so there is a lot of space to reduce the energy consumption.Face to the existing energy consumption problems in Spark scheduling,the following work has been done:1.Energy efficiency relationship strategy table is proposed by this thesis to record the task's running time and energy consumption.The energy efficiency relationship strategy table plays a key role in the task scheduling process.2.According to the way of submitting job,dividing stage and distributing task in Spark,this thesis constructs the stage's energy consumption model,job's energy consumption model,and application's energy consumption model,and proposes the objective function of the Spark energy scheduling problem.3.Based on the energy efficiency strategy table and the Spark energy consumption model,this thesis proposes two energy-aware Spark scheduling algorithms,A-type and Btype.The A-type algorithm greedily chooses the low energy consumption node to assign task with satisfying the data locality.In order to further optimize the problem of long running time of the A-type algorithm in the case of less task,the B-type algorithm reduces the execution time by balancing the tasks on the nodes.There are applicable scenarios for each scheduling algorithms respectively.To sum up,the energy-aware Spark scheduling strategy proposed by this study have the characteristics of energy perception and dynamic scheduling.This thesis selects four workloads from the Hibench benchmark and performed a variety of experiments.Experiments show that the two Spark energy-saving scheduling algorithms can effectively reduce the energy consumption of Spark applications.Compared with the native scheduling strategy,the A-type algorithm can reduce the energy consumption by 22% to 34% on average,and the B-type algorithm can reduce the energy consumption by 20% to 31% on average.
Keywords/Search Tags:Apache Spark, Big Data, Energy-Aware, Scheduling Strategy
PDF Full Text Request
Related items