Font Size: a A A

Energy-Efficiency Scheduling Algorithms For Hadoop Clusters

Posted on:2016-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y ChenFull Text:PDF
GTID:2308330473455118Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the wide use of cloud computing in the enterprise, Hadoop with HDFS distributed file system and MapReduce distributed computing model, becomes preference in many IT enterprises. In large enterprises, Hadoop cluster usually consists of hundreds of nodes. Therefore, reducing energy consumption in large-scale clusters can not only saves the enterprise costs, but also saves energy costs and protects the environment.The energy efficiency is not considered in traditional Hadoop cluster. When Hadoop cluster starts and some nodes may be idle for a period of time, this may lead to a lot of energy waste. At the same time, the efficiency of Hadoop default scheduler is very low, it usually takes a long time to complete the tasks, and the low efficiency also causes extra energy consumption.In the past, most of cluster energy study focuses on reducing energy consumption through additional hardware, the adaptability is very poor. The existing cluster scheduling algorithms mainly work on load balancing rather than energy-efficiency, because of the specificity of the Hadoop architecture, most algorithms can’t be directly applied.In view of the above problems, this thesis mainly studies on the energy saving scheduling algorithms for Hadoop cluster. It analyses of the shortcomings of Hadoop platform and scheduler, discusses on how to reduce the energy consumption of the Hadoop cluster in details. The main contents are as follows: firstly, introducing the structure and framework of the Hadoop platform, Hadoop core components HDFS and MapReduce programming model; secondly, by proposing a new energy model, a dynamic Hadoop energy management method is introduced, the method can dynamically sleep some nodes when the overall load of the cluster is low, reducing the overall energy consumption of cluster; Thirdly, HScheduler algorithm is proposed, the algorithm can dynamically adjust the allocation of Hadoop resources for jobs, it reduces the total energy consumption by minimizing total running time of multiple jobs; Next, the Reduce Load Balancing algorithm(RLB algorithm) is presented for the data skew problem in MapReduce process, it reduces the running time of a single task, further reducing the energy consumption of the Hadoop platform; Finally, extensive real tests are conducted to validate energy-efficiency of proposed algorithms.
Keywords/Search Tags:cloud computing, energy saving, scheduling optimization, minimized makespan
PDF Full Text Request
Related items