Font Size: a A A

Optimization and gereralization of OLAP cube processing in relational database systems

Posted on:2012-06-12Degree:Ph.DType:Dissertation
University:University of HoustonCandidate:Zhibo, ChenFull Text:PDF
GTID:1468390011464246Subject:Computer Science
Abstract/Summary:
On-Line Analytical Processing, OLAP, is a family of database algorithms and techniques used to compute multiple aggregation queries on an input fact table. Because it is generally accepted by OLAP queries are both slow and hard to optimize when evaluated within a DBMS, a significant portion of cube computations is often computed outside the database system, which requires an exporting of the data set, which is known to be very expensive for large data sets. In this dissertation, we contradict such traditional practice by providing three technical contributions, at three different levels. At the lowest level, we improve OLAP data cube pre-computation with innovative memory-only data structures that can be seamlessly integrated within a user-defined function, an extensibility mechanism provided by SQL. Second, we extend OLAP exploratory operations by introducing horizontal aggregations, a novel operation in SQL that combines pivoting and multi-dimensional aggregation of the data set into a single query. Such operation is essential as a pre-processing step for data mining algorithms, which do not generally accept a data set with a vertical layout. We show that horizontal aggregations are faster and more flexible than a built-in pivot operator. Finally, with OLAP as a data mining tool, we show it can synergistically work with statistics. By combining OLAP cubes with a parametric statistical test, we developed Cube Statistical Tests, a novel analytic algorithm that allows for the isolating of predictive attributes. This is accomplished with similarity comparisons between two cube subgroups based on a probabilistic function. In addition, we study the validity of discovered patterns by performing a reliability analysis against Association Rule Pairs, an extension of the standard pattern discovery algorithm. We show that Cube Statistical Test is superior in terms of the patterns.
Keywords/Search Tags:OLAP, Data, Cube
Related items