Research And Implementation Of Online Multiple Aggregation Query System Over The Big Data

Posted on:2016-12-17

Degree:Master

Type:Thesis

Country:China

Candidate:Y G Dang

Full Text:PDF

GTID:2428330542489570

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the advent of the big data era,enterprise data accumulated explosive growth.And we need to make business decisions based on massive historical data,OLAP's emergence makes it easier to process big data greatly.However,because of the big volume and high dimensions of the data,OLAP technology still faces serious challenges in terms of computing and storage,handling just alleviate these challenges under the distributed environment.In order to improve the query efficiency over large data,there has emerged data cube.However,Data Cube has a big drawback that its construction costs enormous time and space.To solve this problem,a effective lossless compression technique is appeared which is Closed Data Cube.The flexibility of Data Cube is also poor,which often support only one type aggregate queries,Histogram Data Cube can enrich it quite a lot.Based on existing compression method,this thesis proposes a new compression method,the storage structure of histogram data cube has carried on the optimized processing.The derived closed tuples and its basic tuple which has the smallest number of tuples stored together,and store the closed tuple corresponding measurements and closed tuples coding,which an integer coding represents a closed tuple.It can effectively reduce the storage space.This thesis has used an existing reversed count invert method to deal with measurement vector,so as to meet the approximate query,thereby reduce the cost of the measure vector.And this thesis has improved MRC-Cubing algorithm to make it easier and efficient to calculate All tuple and basic tuples,and proposed a calculate method over large closed tuples which balance the load of each task.Build a closed histogram data cube is a big spending on time,so we hoped that the new data can be quickly integrated into the closed histogram data cube.In this thesis,we analyzed the revenue and cost of incremental updating of data cube and proposed two methods of distributed incremental updating.The one is to merge the new data directly with the existing cube and the other is to merge the two cubes.These two methods reduce a lot of time when compare to the recalculate method.Users can choose which update method according to their own needs.In order to speed up the query of the closed histogram data cube,this thesis presents a query method based on MapReduce framework's inverted index,the speed of query is improved more obviously over large amounts of data.In order to achieve closed histogram data cube online nearly real-time queries,we use HBase as storage platform to store histogram cube and the index,according to the query key,query code and inverted index to achieve interactive query.In this thesis use TPC-DS test data set has proved by the experiment on the compression of the data cube,and the relative to recalculate and incremental updating data cube and the advantages of relative to the previous query efficiency of query algorithm and realization.

Keywords/Search Tags:

Closed Data Cube, Histogram Cube, Incremental Updating, Online Query, Compression Storage

PDF Full Text Request

Related items

1	Research And Implementation Of Online Multiple Aggregation Query System Over The Big Data
2	Research And Implementation Of Construction Algorithms For Closed Histogram Cube
3	Design And Implementation Of Closed Histogram Cube Based On Hadoop
4	Techniques Research For Data Cube Compression
5	Research And Implementation Of Distributed Cube Distributed Storage And Construction Algorithm
6	The Research Of Data Cube Incremental Calculation Method In OLAP
7	Research And Implementation Of Construction And Query Techniques Of Histogram Data Cube Based On Hadoop
8	Research On The Efficient Materialization And Fast Query Of Condensed Data Cube
9	Research On The Technology Of Label Cube
10	Cc-bitmaps: An Effective Index Technology Of The Closed Cube