Font Size: a A A

Research And Implementation On Mapreduce-based Aggregation Algorithms

Posted on:2011-06-22Degree:MasterType:Thesis
Country:ChinaCandidate:W GaoFull Text:PDF
GTID:2248330395457984Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The development of information technology, especially the rapid development of network technology, led to the rapid growth of data amount. Therefore, modern rearch has centered on how to ensure the effective storage and management for massive data to improve the computing efficiency of massive data. Aggregation computing is one of the most typical data pre-processing methods, which can be great significance improving query efficiency on massive data. However, aggregation of massive data requires enormous computing power and storage capacity. The general PC machine can not provide such huge computing resources. Therefore, research on aggregation computing of massive data is very important.On the base of the distributed Google File System (GFS), parallel computing framework (MapReduce), we study the scalability and fault-tolerance ability of GFS and thes parallelity and highly scalable computing power of MapReduce in large data sets, this paper proposes creative MapReduce-based aggregation algorithms for massive data, which includes the selection, projection and the equivalent joint on relational data. Based on this we realize MapReduce-based Counting, Summing, Averaging, Maxing. and Mining and so on. The algorithms, which make use of cluster computing power, storage capacity, and network bandwidth, improves the aggregation efficiency of massive data and reduce the processing time, and improve query efficiency greatly.This thesis also proposes global closed data cube generation algorithm based on MapReduce. Experiment results show that with the exploring cluster computing resources, the algorithms can generate the closed data cubeand reduce the query time on global closed data cube quickly and efficiently.
Keywords/Search Tags:closed data cube, Aggregation operation, MapReduce, Hadoop
PDF Full Text Request
Related items