Font Size: a A A

Research On Load Balance Algorithm In Heterogeneoushadoop Cluster

Posted on:2016-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:D P LiuFull Text:PDF
GTID:2298330467492623Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Hadoop is one of the most important open source framework for big data, which mainly consists of two parts:MapReduce and HDFS. MapReduce is used for data processing. In order to improve the efficacy and safety of data management, the MapReduce task will be divided into two phases. HDFS is responsible for data storage management. It is a highly fault-tolerant system and it is used to run on low-cost generic hardware through the detection and response to a hardware failure.When the cluster nodes are homogeneous, Hadoop has a good performance. In practice, the homogeneity assumptions do not always hold. In heterogeneous environment, there are various devices which vary greatly in the capacities of computation, communication, architectures, memories and power.When different nodes process the same amount of data, load balancing problem occurs.The load balancing problem mainly involves two aspects, one is the load balancing of processing data in MapReduce, and the other is data placement in HDFS.For the load balancing problem in MapReduce, we address the problem of how to assign data after Map phase to balance the execution time of each Reduce task by proposing a novel load balancing algorithm based on nodes performance (LBNP), in which the input data of poor performance nodes are decreased.Simulation results indicate that all the Reduce tasks can be completed in the same time which shortens the whole Reduce phase. Thus the efficiency of MapReduce is improvedFor the load balancing problem in HDFS, We propose a node performance evaluation model based on the unbalanced nodes performance and the skewed usage of the file in HDFS. Based on this model, we propose a new data placement algorithms. The algorithm will choose smaller load nodes when selecting backup nodes for data blocks. Then the degree of load balancing will be enhanced and the efficiency of MapReduce is improved.
Keywords/Search Tags:Hadoop, MapReduce, HDFS, LoadBalance, DataPlacementStrategy, HeterogeneousEnvironment
PDF Full Text Request
Related items