Research On Data Cube Technology Based On MapReduce

Posted on:2014-04-20

Degree:Master

Type:Thesis

Country:China

Candidate:L Chen

Full Text:PDF

GTID:2268330425484452

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of internet and information technology industry, theamount of data generated by network is ever-growing. Large data contains moreuseful information, it also brings more challenges. On-Line Analytical Processing(OLAP) as an important technology for data storage and analysis is also facing thechallenge of huge amount of computation. As the primary means of OLAP, how toefficiently deal with the massive data of the Data Cube is a key issue in both researchand application field of OLAP. Google’s MapReduce is a simplified distributedprogramming model for processing large-scale data. Based on this distributed parallelmodel, this thesis presents parallel clustering, update and query of the Data Cube. Themain research achievements and innovative points are as follows:(1) Parallel Clustering of Data Cube: Based on the equivalence relation betweenthe semantic features and multi-dimensions of Data Cube, a parallel semantic Cubehierarchical clustering algorithm based on the MapReduce framework is proposed.The Data Cube can rapid clustering, and ultimately save the equivalence classes of theupper and lower bounds to realize the compressed storage of Data Cube. This methodcan effectively save storage space, and speed up clustering procedure as well. Whencluster information and hierarchical information are saved, it can also provide theadvantage of rapidly updates of Data Cube and the possibility of analysis of OLAPquery behavior.(2) Incremental Maintenance of Data Cube’s hierarchical clustering: based onData Cube equivalence class, combining with the hierarchical relations between theequivalence classes, an efficient batch update algorithm of Data Cube in theMapReudce parallel framework is proposed, in this way, the problem of lowefficiency which is caused by large amount of data maintenance is solved.(3) Parallel OLAP queries: based on Data Cube equivalence classes, some paralleloptimizations on OLAP query point and query range are realized. What’s more, in theimproved MapReudce model, a cache-based OLAP query optimization algorithm isproposed. By defining various operations in the OLAP query, multiple OLAP queriesis parallel processing, which greatly improves the query efficiency.This thesis also analysis various operations of the semantic Cube parallelization indetails. The implementations of these operations under the MapReduce model are properly designed. Some comparisons of parallel algorithm and traditional algorithmare made to prove the superiority of parallel algorithm.

Keywords/Search Tags:

OLAP, Hierarchical Cluster, OLAP Query Optimization, Hadoop, MapReduce

PDF Full Text Request

Related items

1	Research On Distributed OLAP Query Optimization Based On Hive
2	Multi-Query Optimization Strategy Design And Implementation In Column-based OLAP System
3	Based On Materialized Views Of Olap Query Performance Optimization Research And Application
4	Research On Query Optimization In Dameng OLAP
5	Research And Implementation Of Olap Query Optimization
6	Olap Query Performance Study
7	Build Dynamic OLAP Query Base On XML
8	Partial aggregation and query processing of OLAP cubes
9	Research On Query Optimization In Data Warehouses
10	Research And Implementation Of Construction And Query Techniques Of Histogram Data Cube Based On Hadoop