Font Size: a A A

Materialized view selection for multidimensional datasets

Posted on:2000-02-01Degree:Ph.DType:Dissertation
University:The University of Wisconsin - MadisonCandidate:Shukla, AmitFull Text:PDF
GTID:1468390014464145Subject:Computer Science
Abstract/Summary:
This dissertation describes techniques for speeding up Online Analytical Processing or OLAP queries. OLAP systems allow users to quickly obtain the answers to complex business queries. Quickly answering these queries which aggregate large amounts of data, calls for various specialized techniques. One technique used by OLAP systems to speed up multidimensional data analysis is to precompute aggregates on some subsets of dimensions and their corresponding hierarchies.; We first address the problem of efficiently estimating aggregate sizes. Precomputation of aggregate data improves query response time. However, the decision of what and how much to precompute is a difficult one. It is further complicated by the fact that precomputation in the presence of hierarchies can result in an unintuitively large increase in the amount of storage required by the database. Hence, it is interesting and useful to estimate the storage blowup that will result from a proposed set of precomputations without actually computing them. We propose three strategies to solve this problem, and investigate the accuracy of these algorithms in estimating the blowup for different data distributions and database schemas.; Another intriguing problem that we are faced with is which aggregates to precompute. The more that is precomputed, the faster queries can be answered; however, it is often difficult to determine which are the best aggregates to be precomputed given a fixed amount of space. We study the structure of the precomputation problem and show that under certain broad conditions on the multidimensional data, a simple and fast algorithm, PBS achieves good performance bounds. We present an empirical study of PBS that demonstrates that PBS picks a surprisingly good set of aggregates even when the conditions do not hold.; Queries in real world applications frequently require aggregations over multiple cubes (in a star schema, this corresponds to there being multiple fact tables). Unfortunately, most research into aggregate selection has assumed that queries are over a single cube. We analyze aggregate selection in the context of multicube queries, and propose algorithms that perform significantly better than previously proposed algorithms for multicube workloads, without any deterioration in performance for single cube query workloads.
Keywords/Search Tags:Data, Queries, OLAP, Selection, Multidimensional
Related items