Font Size: a A A

Key Value Based Algorithm For Solving Reduce Load Imbalance In Mapreduce

Posted on:2018-11-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q LuoFull Text:PDF
GTID:2348330518453964Subject:Computer technology
Abstract/Summary:PDF Full Text Request
MapReduce computing framework with its simple programming model to deal with the log,document,analysis reports and other complex data sets,but because the MapReduce computing framework is used in a default hash partitioning mechanism in the processing of large amounts of data,it is easy to cause the uneven data partition problems tilt data.Although the open source Hadoop the system provides users with a method of custom partition,but in complex data and no law,we are not clear how the distribution of input data,so it is not able to write user self defined partition method,so the calculation of MapReduce data skew process is hard to avoid.In order to solve the problem of imbalanced data partition,this paper presents a method to partition the value>to the Reducer according to the key of the<key,which is divided into two categories:?The relationship between the traditional partition number and the Reducer task number one to one relationship;?The partition number and the corresponding relationship of Reducer are determined by the partition algorithm and the allocation algorithm?In order to solve the problem of reduce load imbalance caused by the MapReduce task?Through the experiment,analysis of experimental results.The experimental results show that compared with the input size key partition strategy can balance the various Reduce tasks effectively,greatly improve the slant of the data on the Reduce,improves the task execution efficiency.
Keywords/Search Tags:Key partition, Data skew, Big date, Load balance, MapReduce
PDF Full Text Request
Related items