Efficient database support for OLAP queries (On-line analytical processing)

Posted on:2001-09-16

Degree:Ph.D

Type:Thesis

University:The University of Wisconsin - Madison

Candidate:Deshpande, Prasad Manikarao

Full Text:PDF

GTID:2468390014452370

Subject:Computer Science

Abstract/Summary:

Computing multidimensional aggregates is a performance bottleneck for OLAP and multi-dimensional data analysis applications. This thesis addresses various problems that arise while speeding up multi-dimensional queries. The first problem we consider deals with computing the CUBE operator proposed by Gray et al. We show how the structure of CUBE computation can be viewed in terms of a hierarchy of group-by operations, and present a class of sorting-based algorithms that overlap the computation of different group-by operations using the least possible memory for each computation. Experiments show that the method dramatically outperforms the straightforward implementation of CUBE as a sequence of SQL group-by operations.; The second and third part of the thesis deal with caching for multi-dimensional queries. We developed a novel scheme where we cache small regions of the multi-dimensional space called “chunks”. Chunk-based caching allows fine granularity caching, and allows queries to partially reuse the results of previous queries with which they overlap. To facilitate the computation of chunks required by a query but missing from the cache, we propose a new organization for relational tables, which we call a “chunked file.” Our experiments show that for workloads that exhibit query locality, chunked caching combined with the chunked file organization performs better than traditional query level caching. Simple caching unfortunately misses the dramatic performance improvements obtainable when the answer to a query, while not immediately available in the cache, can be computed by aggregating other data in the cache. In order to use aggregation, one must solve two subproblems: (1) determining whether it is possible, and (2) determining the fastest path for this aggregation, since there can be many. We present a naive and a Virtual Count based strategy. The virtual count based methods determine if a query is computable from the cache and the fastest path for this computation almost instantaneously, with a small overhead of maintaining the summary state of the cache. Experiments show that aggregation in the cache leads to substantial performance improvement. The virtual count based methods further improve the performance compared to the naive approaches, in terms of cache lookup and aggregation times.

Keywords/Search Tags:

Cache, Performance, Queries, Multi-dimensional, Aggregation

Related items

1	Research On Multi-Queries System Framework And Aggregation Techniques In Distributed Sensor Database
2	Efficient Algorithms For Approximate Aggregation And Nearest Neighbor Queries Over Multi-Dimensional Data
3	Research Of Multi-dimensional Qos Model Supporting Web Service Queries
4	Design And Implementation Of OPC UA Multi-Server Aggregation Software
5	Massive Data Aggregation And Parallel Implementation With Complex Constraints
6	LambdaRAM: A high-performance, multi-dimensional, distributed cache over ultra-high speed networks
7	Key Research Issues Of Memory Architecture For Three Dimensional Multi-Core Processors
8	Information Retrieval Techniques Based On Multi-dimensional Queries
9	Research On Data Queries And Processing Techniques In Moving Objects Databases
10	Continuous monitoring of multi-dimensional queries