Font Size: a A A

Research On Energy-aware Load Balancing In Heterogeneous Hadoop Cluster

Posted on:2022-09-09Degree:MasterType:Thesis
Country:ChinaCandidate:C HuangFull Text:PDF
GTID:2518306509960179Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rise and popularization of the Internet,we have entered an era where everyone produces data.The data that needs to be processed and stored is growing exponentially,but the traditional data processing and storage technology has encountered a bottleneck.As the mainstream platform for mass data processing and storage,cloud computing emerges at the historic moment.Hadoop is one of the open source distributed parallel computing frameworks for cloud computing.It is released by the Apache Foundation,and Hadoop is widely used in many large companies,such as Yahoo,Facebook,Amazon,Google,Microsoft and so on.Therefore,Hadoop has gradually become the mainstream framework for mass data processing and storage on the cloud platform.The urgent problems to be solved in Hadoop clusters are as follows:(1)Due to heterogeneous user requirements,and server upgrade and replacement,the Hadoop YARN cluster is usually composed of heterogeneous server nodes with different resource configurations.In a heterogeneous Hadoop YARN cluster environment,there is usually load imbalance among server nodes;(2)The inbuilt scheduler in Hadoop YARN framework is not designed for energy efficiency,and the large power consumption generated by its corresponding clusters cannot be ignored.(3)There are problems of unbalanced resource allocation and low resource utilization among all server nodes in the heterogeneous Hadoop YARN cluster,which will lead to low overall performance of the Hadoop YARN cluster,and it is difficult to meet various computing and storage requirements in the era of big data.This thesis will focus on three research directions of energy consumption,performance and load balancing under heterogeneous Hadoop YARN cluster environment to solve the above problems.The main work contents of this thesis are as follows:(1)This thesis designs a system framework based on heterogeneous Hadoop YARN cluster.The system framework is composed of three basic modules: Task Analyzer,Online Dispatcher,and DVFS Adjuster.This thesis formulate the task scheduling in Hadoop YARN cluster as an energy consumption optimization problem within within the user-specified deadline.(2)This thesis develops a heuristic algorithm based on load balancing and deadline.The heuristic algorithm assigns Map Reduce tasks to server nodes based on the load balance factor and the user-specified deadline for improving energy.(3)This thesis carry out extensive experiments on a real Hadoop YARN cluster consisting of five server nodes and utilize three types of Map Reduce jobs to evaluate the effectiveness of the heuristic algorithm.The results show that compared with three alternative methods applied to similar problems,the heuristic algorithm can complete tasks within deadline,and minimize cluster energy consumption while balancing cluster load.
Keywords/Search Tags:cloud computing, Hadoop YARN, load balancing, deadline, DVFS
PDF Full Text Request
Related items