Font Size: a A A

The Research Of Load Balancing In Mapreduce Based On Data Locality

Posted on:2015-03-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y ChenFull Text:PDF
GTID:2298330467450817Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of society, the data size of information that in Internet is explosive growth. Mass data is generated from Internet every day. It is a big challenge to people to find a way that can quickly and efficiently process mass data. In this background, cloud computing-a technology which aims to process large-scale data is developing very fast.MapReduce is one of cloud computing technology, and it can process mass data parallelly. It has obvious advantage, such as high scalability and high fault-tolerance, so it is a popular programming model in cloud computing technology. Hadoop is an implement about MapReduce, many companies and universities are using it to develop or do research on cloud computing application. However, it exposes some drawbacks due to its operation mechanism. When cluster needs to process skew data, it can’t equally assign data to reducer. Therefore, it not only leads to load imbalance among process node, but also increases network load.In this paper, a partition algorithm (DALP) that based on data locality is presented in the paper, which is in order to optimize default partition algorithm. Sampling is used to preprocess data and output the frequency of each key. Then data-aggregate strategy is made according to the frequency. The data aggregate strategy can assign suitable data to Reduce node, it can relieve load imbalance and improve the performance of cluster. In order to save bandwidth resource, taking an in-depth study of the data locality, and a partition algorithm is presented which based on data locality. This part can assign data-aggregate data to appropriate Reduce node, so it can reduce network load and transmission time. The DALP had been tested through multigroup experiment, and it had been proved that it could improve the performance of cluster very well.
Keywords/Search Tags:Cloud Computing, MapReduce, Load Balancing, Data Locality
PDF Full Text Request
Related items