Font Size: a A A

Research On The Energy-aware Scheduler For Hadoop

Posted on:2015-05-11Degree:MasterType:Thesis
Country:ChinaCandidate:W LiFull Text:PDF
GTID:2298330422990921Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Nowadays data are rapidly increasing every day in all walks of life, academiaand businesses find these data hidden great value. So a variety of data analysisplatform developed, in which Hadoop is an open source implementation ofMAPREDUCE computing model and GFS storage model proposed by google. Inrecent years, the accumulation of greenhouse gases are changing the globalclimate, carbon emissions should also be put in an important position in theconstruction of the data center; while data center energy consumption is also animportant part of business costs. With the expansion of the scale of Hadoopclusters, energy have been a heavy problem. Therefore, studying how to reducethe energy consumption of a Hadoop cluster is very important for reducingbusiness costs and protecting environmental.Combined the research on architecture and working principle of the Hadoopplatform and Mapreduce, the paper identifies a set of architecture to controlenergy consumption from the perspective of resource and task scheduling inHadoop platform. Through the analysis and testing of the FIFO Scheduler and theCapacity Scheduler in Hadoop, found that the two schedulers had defects anddeficiencies for building energy control framework. Based on these defects anddeficiencies the paper designed and implemented the energy-aware scheduler forHadoop. This scheduler had built a framework for energy control and designedthe two-tier scheduling policy for energy-efficient scheduling from jobs toresources.The energy-aware scheduler for Hadoop designed in this paper has thefollowing two characteristics:1) the scheduler can adjust and balance the Qosand total energy consumption of the jobs during Hadoop cluster running;2) thescheduler has efficient scheduling policy. The overall framework of the schedulerbased on multi-queue, the two-layer scheduling strategy is designed toaccomplish the dynamic energy saving matching between jobs and computingresource. The two-layer scheduling strategy is high efficiency and the timecomplexity is linear. In the multiple queues jobs use a similar method of theconsistency hash to ensure the efficient operation of dynamic allocation and high concurrency.Finally, the paper uses the XCP (xen cloud platform) to build a Hadoopcluster environment with32virtual machines. In this Hadoop cluster environmentthis paper designed the contrast experiment between the energy-aware schedulerfor Hadoop and the FIFO scheduler、the Capacity Scheduler in Hadoop. Thegoal of the contrast experiment is to compare the total time and energyconsumption for jobs running in this cluster; the other goal is to compare theenergy-awaere Scheduler’s ability of controling time and energy cost.For thecontrast experiment the paper chooses different input jobs and differentScheduler. Experimental results show that the energy-aware scheduler forHadoop proposed in this paper has better energy control capabilities, can reducemore energy consumption of the cluster, while the time is not increase; on theother hand,the energy-aware Scheduler has good control ability between timecost and energy cost.
Keywords/Search Tags:Green Computing, Big Data Analysis, Hadoop, MapReduce, JobScheduling
PDF Full Text Request
Related items