Font Size: a A A

Research On Load Balance Of Hadoop Cloud Computing Platform

Posted on:2015-11-21Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhaoFull Text:PDF
GTID:2298330434465760Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Load balancing is very important in the Hadoop cluster system, the proper loadbalancing strategy can not only improve the performance of the cluster, but alsoimprove the client’s experience. The task scheduling strategy and method have a greatinfluence on the load distribution in the Hadoop cluster. But the present Hadoopscheduling methods haven’t taken the load balance into consideration. This papermainly research the load balance from the angle of the task scheduling, considering theload balance during the scheduling has more significance to load balance than adjustingit after the cluster is imbalanced.This paper introduces much things in detail, they are the cloud computing, thedistributed framework MapReduce, HDFS and Hadoop, which is the open sourceimplementation of MapReduce in Java. Mainly analyze the executing process of a Jobin Hadoop and the scheduling algorithm,such as FIFO、Capacity Scheduler and FairScheduler in Hadoop cluster. From the point of the cluster’s load balance, we come upwith a new scheduling method, named DFLB(Dynamic Feedback Load Balance),making full use of the Heartbeat information brought by the TaskTracker asking for newJobs. We collect the execution information and then feedback to the JobTracker, whichwill be used to decide whether the scheduling can meet the load balance, and wecontinue to collect the new tasks’s information, eventually forming a closed loop ofcollect-feedback-use-collect. We also define the load balance of a cluster using math,which will be the basis of consideration and judgment when the task scheduling newTasks for a TaskTracker. In addition we put forward a concept of dynamic priorityduring the task scheduling, considering the fairness among the Jobs.We construct a Hadoop cluster after analyzing the flow of the dynamic feedbackload balance in detail, to test the research results, and then compare and analyze theresults, the analyses results show that the DFLB can not only meet the load balance, butalso improve the jobs’ average response time, comparing with the common Hadoopscheduling algorithms and it can prove that DFLB also has impact on the resourceutilization rate, and the parallelism between jobs, making significant progress in theload balance of Hadoop cloud computing platform.
Keywords/Search Tags:Big Data, Dynamic Feedback, MapReduce, Hadoop, Load Balance, Dynamic Priority, HDFS
PDF Full Text Request
Related items