Font Size: a A A

Managing large multidimensional datasets inside a database system

Posted on:2002-02-05Degree:Ph.DType:Thesis
University:University of Illinois at Urbana-ChampaignCandidate:Chakrabarti, KaushikFull Text:PDF
GTID:2468390011490196Subject:Computer Science
Abstract/Summary:
This thesis develops techniques to manage large amounts of multidimensional data inside a database system. To be able to handle multidimensional data efficiently, we need access methods (AMs) to selectively access some data items in a large collection associatively. Commercial databases lag far behind in their support for multidimensional access methods. In this thesis, we design and implement the hybrid tree, a multidimensional index structure that scales to high dimensional spaces. The hybrid tree combines the positive aspects of the two types of multidimensional index structures, namely data partitioning (e.g., R-tree and derivatives) and space partitioning (e.g., kdB-tree and derivatives), to achieve search performance more scalable to high dimensionalities than either of the above techniques. Our experiments show that the hybrid tree scales well to high dimensionalities for real-life datasets.; To achieve further scalability, we develop the local dimensionality reduction (LDR) technique to reduce the dimensionality of high dimensional data. LDR exploits local, as opposed to global, correlations in the data and hence can reduce dimensionality with significantly lower loss of distance information compared to global dimensionality reduction techniques This implies fewer false positives and hence better search performance.; To enable efficient similarity search on time series data, we develop a dimensionality reduction technique, called Adaptive Piecewise Constant Approximation (APCA), for time series data. APCA adapts locally to each time series object in the database and chooses the best reduced-representation for that object. We show how the APCA representation can be indexed using a multidimensional index structure. Our experiments show that APCA outperforms the other techniques by one to two orders of magnitude in terms of search performance.; Before multidimensional index structures can be supported as AMs in “commercial-strength” database systems, efficient techniques to provide transactional access to data via the index structure must be developed. We develop concurrency control techniques for multidimensional index structures.; To handle the huge data volumes and fast response time requirements in decision support applications, we develop an approximate query processing technique based on multidimensional wavelets. Our technique constructs compact synopses (comprising of wavelet coefficients) of the relevant database tables and subsequently answers any SQL query by working exclusively on the compact synopses. Our approach provides more accurate answers and faster response times compared to other approximate query answering techniques.
Keywords/Search Tags:Multidimensional, Data, Techniques, Large, APCA, Time, Develop
Related items