Research On The Efficient Materialization And Fast Query Of Condensed Data Cube

Posted on:2012-08-24

Degree:Doctor

Type:Dissertation

Country:China

Candidate:W Y Yan

Full Text:PDF

GTID:1118330368984117

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Data cube is the basic data model for online analytical processing server. In order to improve the efficiency of OLAP queries, data cube's building become the focus of many studies. In addition to the condensed data cube can be used to reduce the dimensions of the cube data, which can significantly reduce the computation time of data cube and storage overhead. In practice, to speed up the query response time of OLAP, data cube is often materialized in advance. Therefore, it is of great significance for further research on method of complex data cube computation, efficient materialization method of condensed data cube in different storage medium and how to use materialized data to respond to queries quickly.HierPrefixCube was proposed to solve the issues caused by the introduction of hierarchy to the data cube's construction. Hierarchy brings two major problems:First, the nodes in Cube Lattice increased dramatically, and its model becomes more complex, so a new Cube Lattice traversal algorithm to make the calculate effectively is need to be developed; Second, the tuples of data cube which is needed to be materialized increased rapidly, then, a new storage model should be studied to eliminate all forms of redundancy for using the space effectively. PrefixCube was proposed to be an efficient cube structure by augmenting BU-BST Condensing with intra-cuboid prefix-sharing, however, it does not support dimension hierarchies directly. Therefore, we extend the PrefixCube architecture for incorporating hierarchical data cubes, which can calculate hierarchy data cube, and hence get HierPrefixCube. HierPrefixCube has not only got efficient cube compression ratio but also made a good compromise among data cube compression, restoring and query. Experimental results show that while realizing a query based on dimension hierarchy, HierPrefixCube has a lower calculation time, and its compression effect for data cube size is also apparent.Precomputed and materialized data cubes, can greatly shorten the OLAP query response time. However, materialized data stored in external storage will still bring a lot of I/O operation. As memory prices decreases, the materialization of a subset of cube data in memory turns out to be particularly applicable to the OLAP with time constraints. So, on the basis of existing technologies, the tuple as materializing unit is used to build materialized data selection model which applies to condensed data cube in memory. Under the precondition that there is enough main memory space to hold the finest granularity cuboid at least, two-level Hash structure is adopted in memory, to achieve the purpose of avoiding to recalculate date cube and responsing query rapidly and correctly. And further to optimize queries, build better choice model. Because of the finest granularity tuples and other coarser granularity tuples are in main memory, the time-consuming accessing from disk is avoided, the update and maintenance cost is also reduced. Experimental results show that the materialized data cubes in memory can reduce the query response time effectively, and prioritising the smaller size cuboids based on condensed data cube is time optimal among several different selection models of materialized tuples in main memory.Query response time can be reduced by materializing data cubes in memory, but vulnerable to memory space limitations, materialized requirement of the larger cube is difficult to meet. With the rapid development of flash memory technology, NAND flash based SSD has advantages such as higher access speed, low power consumption and lower costs. Combined with the tuple storage characteristics of condensed data cube, the three level storage structure of "Disk-Memory-NAND flash memory" is proposed. Because of the unbalanced time delays of read, write, and erase, as well as the restricted features of non-local update and erase times, the Multi Level Dynamic Perfect Hash index structure is used to index the materialized tuples of data cube stored in flash memory. In the process of materialization, the write operation is transformed into serialized operation series, and data are inserted by adding on without causing the problem of "frequent write" operation. Final experimental results show that: the data cube storage method based on the Dynamic Perfect Hash index structure not only provides higher disk storage query response time, but also avoid the problem of insufficient memory space.Query by using materialized view speed-up is a common method for optimization, and its real essence in multi-dimensional aggregation applications is also to use the materialized data cube to speed-up response time. The capability of multi-dimensional computation of traditional SQL can be strengthened by the SPREADSHEET clause. This paper studies the SPREADSHEET clause with materialized-view match, improves the response time of spreadsheet-queries by using materialized data, and gives the algorithm of materialized view matching with SPREADSHEET clause. Experimental results show that materialized view matching with SPREADSHEET clause can speed up queries effectively and has outstanding scalability.

Keywords/Search Tags:

Online analytical processing, Data cube, Condensed data cube, HierPrefixCube, Materialization, Query, Hash index

PDF Full Text Request

Related items

1	Techniques Research For Data Cube Compression
2	OLAP Algorithm Research Based On Dimension Hierarchy For Data Cube
3	The Online Mining Of Data Cube Gradient
4	Research On The Storage Of Condensed Cube Based On Flash Memory
5	Research On The Technology Of Label Cube
6	Novel techniques for data warehousing and online analytical processing in emerging applications
7	Research On Fast Data Cube Computation Method Based On Spark Platform
8	Research And Implementation Of Online Multiple Aggregation Query System Over The Big Data
9	Research Of Distributed Data Cube Partial Materialization Method Based On Genetic Algorithm
10	Design And Implementation Of Online Marketing Data Analysis Platform Based On The Materialized Data Cube