Font Size: a A A

Efficient database support for OLAP queries (On-line analytical processing)

Posted on:2001-09-16Degree:Ph.DType:Thesis
University:The University of Wisconsin - MadisonCandidate:Deshpande, Prasad ManikaraoFull Text:PDF
GTID:2468390014452370Subject:Computer Science
Abstract/Summary:
Computing multidimensional aggregates is a performance bottleneck for OLAP and multi-dimensional data analysis applications. This thesis addresses various problems that arise while speeding up multi-dimensional queries. The first problem we consider deals with computing the CUBE operator proposed by Gray et al. We show how the structure of CUBE computation can be viewed in terms of a hierarchy of group-by operations, and present a class of sorting-based algorithms that overlap the computation of different group-by operations using the least possible memory for each computation. Experiments show that the method dramatically outperforms the straightforward implementation of CUBE as a sequence of SQL group-by operations.; The second and third part of the thesis deal with caching for multi-dimensional queries. We developed a novel scheme where we cache small regions of the multi-dimensional space called “chunks”. Chunk-based caching allows fine granularity caching, and allows queries to partially reuse the results of previous queries with which they overlap. To facilitate the computation of chunks required by a query but missing from the cache, we propose a new organization for relational tables, which we call a “chunked file.” Our experiments show that for workloads that exhibit query locality, chunked caching combined with the chunked file organization performs better than traditional query level caching. Simple caching unfortunately misses the dramatic performance improvements obtainable when the answer to a query, while not immediately available in the cache, can be computed by aggregating other data in the cache. In order to use aggregation, one must solve two subproblems: (1) determining whether it is possible, and (2) determining the fastest path for this aggregation, since there can be many. We present a naive and a Virtual Count based strategy. The virtual count based methods determine if a query is computable from the cache and the fastest path for this computation almost instantaneously, with a small overhead of maintaining the summary state of the cache. Experiments show that aggregation in the cache leads to substantial performance improvement. The virtual count based methods further improve the performance compared to the naive approaches, in terms of cache lookup and aggregation times.
Keywords/Search Tags:Cache, Performance, Queries, Multi-dimensional, Aggregation
Related items