Font Size: a A A

Performance issues of multi-dimensional data analysis

Posted on:1999-01-15Degree:Ph.DType:Thesis
University:The University of Wisconsin - MadisonCandidate:Zhao, YihongFull Text:PDF
GTID:2468390014969970Subject:Computer Science
Abstract/Summary:
This thesis investigates performance issues that arise in multi-dimensional analysis. It tries to solve three closely related problems in multi-dimensional data analysis by designing high performance storage structures and algorithms. Recently, On-Line Analytical Processing (OLAP) has become the most widely used and that in many cases the array ADT can provide significantly higher performance than can be obtained by applying traditional techniques such as bitmap indices and star-join algorithms to relational tables.; In the second part of the thesis, we designed and implemented array-based algorithms for cubing, which is one of the core operations of OLAP. Cubing computes group-by aggregations over all possible subsets of the specified dimensions. Our position-based algorithms adopt a dramatically different approach for computing multiple aggregated than the traditional value-based aggregation methods due to the fundamental difference of data storage. We have developed the array-based cubing algorithms. They compute a cube from data residing on the Unix file system. The performance study shows that the array-based cubing algorithms are significantly faster than the leading relational counterparts.; In the third part of the thesis, we study the issues of optimization and evaluation for multiple related OLAP queries. This problem has become increasingly important since Microsoft proposed its "OLE DB for OLAP" standard. OLE DB for OLAP defines Multi-Dimensional Expressions (MDX), which have the interesting and challenging feature of allowing clients to ask several related dimensional queries in a single MDX expression. To solve the problem, we developed two algorithms to generate a good global plan by using a proper set of precomputed aggregates. Furthermore, we designed and implemented a set of new query primitives for the multiple queries sharing portions of their evaluation. We have developed the algorithms and three new query operators in the same version of Paradise DBMS as the first part of the thesis. We have run our performance tests on the Paradise.
Keywords/Search Tags:Performance, Multi-dimensional, Issues, Thesis, Data, OLAP, Algorithms
Related items