Research Of The Energy-efficient Scheduler For Hadoop Based On Storage Driven

Posted on:2017-04-21

Degree:Master

Type:Thesis

Country:China

Candidate:L Wang

Full Text:PDF

GTID:2308330503987207

Subject:Computer Science and Technology

Abstract/Summary:

The 21 st century is the age of information, along with information transmission there will be huge amount of data increasing.People gradually found the value hidden behind these massive data.So many framework using for massive data analysing have been developed,Hadoop is this one of the most classic massive mass data processing framework. Hadoop need to build plenty of data nodes in the cluster, and it can provide efficient parallel computing of massive data through HDFS, Yarn, Map Reduce and other components. However, in recent years, the emissions of carbon dioxide and other greenhouse gases has been increasing year by year, resulting in a global warming growing, while large-scale data centers require huge server cluster as a support, as well as large-scale refrigeration equipment,they all consume large amount of energy and business costs overhead, it makes data center energy conservation issues more and more important, so how to control the energy consumption by Hadoop cluster is of great significance for corporate survival and global climate protection.Through the in-depth analyse of the two replications storage strategy which are Random Volume Choosing Policy and Round Robin Volume Choosing Policy of Hadoop,the paper found the defects of them in the sides of energy contr ol.while combined with data locality scheduling principles of Hadoop,this paper proposed a strategy, which through the principle of data locality putting task scheduling into replications scheduling, innovatively design a energy-efficient scheduler for Hadoop basde on storge driver,while this scheduler had built two replications storage strategy to control cluster load balancing, and energy consumption, as the core of energy-efficient scheduler basde on storge driver.The energy-efficient scheduler basde on storge driver for Hadoop has the following characteristics:1) This scheduler can reduce the time of task running and control total energy consumption of job during Hadoop cluster running. 2) The energy-efficient scheduler basde on storge driver has designed the two-tier replications storage strategy,which is the core of this energy-efficient policy. In the first layer of the policy based on the remaining of volumes,and the number of disk read and write operations to improve overall cluster performance thr ough load balancing, thereby reducing the running time of the task. 3) The second layer of the policy mainly concentrated on the features of different kinds of task and the real-time status of data nodes,then match the tasks and data nodes by their feature s. make the task costs least energy consumption running on the target data node, so as to achieve the purpose of reducing the energy consumption of the cluster.Finally, the paper builds a Hadoop cluster environment with 32 data nodes through the Xen Server platform.In this environment,the paper firstly to verify the correctness of the energy measure model theory and the derived formula through the two sets experiments. Then we set three comparative experiments to put the energy-efficient scheduler for Hadoop basde on storge driver compared with two replications storage strategy which are Random Volume Choosing Policy and Round Robin Volume Choosing Policy of Hadoop,to prove that energy-efficient scheduler for Hadoop basde on storge driver has better control of dat a node load balancing, and can relatively reduce overall operating time of Hadoop cluster tasks, and has a relatively prominent energy saving effect on the sides of cluster overall energy consumption controling.

Keywords/Search Tags:

Green Computing, Big Data Analysis, Hadoop, HDFS, Replications Strategy

Related items

1	Join Processing And Optimizing On Large Data Sets Based On Hadoop Framework
2	Research On The Energy-aware Scheduler For Hadoop
3	Research On Storage Strategies And Optimization Hadoop Platform
4	Mass Sales Data Processing Platform Design And Implementation
5	The Design Of The Cloud Computing System Based On Hadoop
6	Analysis And Application Development Of Hadoop Distributed Computing Platform
7	The Cloud Computing Based On Hadoop Platform And Log Analysis
8	Research On Optimization Of Data Redundancy Strategy Based On HDFS
9	Research On Big Data Text Analysis Based On Hadoop Architecture
10	Research On Security Key Issues Of Cloud Computing Based On Hadoop