Font Size: a A A

Research On Lightweight Load Balancing Under Mapreduce

Posted on:2018-10-23Degree:MasterType:Thesis
Country:ChinaCandidate:Z Z WangFull Text:PDF
GTID:2348330521950732Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Since the Google proposed the MapReduce programming model, it has been widely used as a large data processing tool, and now MapReduce has been considered as one of the most efficient data processing tools. However, the MapReduce model also has some disadvantages,that is, the default hash partition in the partition process may cause reduce skew. For this reason,this thesis proposes a simple and efficient algorithm to solve the problem of MapReduce data skew which is caused by the data division in reduce phase.In this thesis, the lightweight sampling method and the heuristic partition strategy are used to achieve the data balance in reduce phase. The lightweight sampling is achieved through the parallel execution with Map and the simplification of the common key set, and then the partition decision is made through constructing the data distribution by estimating the sample data set.Firstly the thesis uses the InputFormat interface of the MapReduce framework to achieve the parallel sampling in the Map, and proposes a method based on random sampling and reservoir sampling to obtain the sample data set. Then, the high frequency key set is found out in the sample set by collecting and merging the sampled data set, and the normal key set is simplified according to the number of high frequency key set. Next, the load list of the real data set is constructed by the high frequency key set and the simplied normal key set. Finally,according to the different frequency of the high frequency key value in the load list, the thesis designs the corresponding heuristic partition algorithm. At the same time, in order to deal with the super large key set, the thesis proposes an improved heuristic partition algorithm with splitting threshold. The experimental results show that the lightweight load balancing algorithm can effectively balance the data skew in reduce phase and improve the performance of the system.
Keywords/Search Tags:MapReduce, Load balancing, Data skew, Lightweight, Parallel sampling
PDF Full Text Request
Related items