Managing large multidimensional datasets inside a database system

Posted on:2002-02-05

Degree:Ph.D

Type:Thesis

University:University of Illinois at Urbana-Champaign

Candidate:Chakrabarti, Kaushik

Full Text:PDF

GTID:2468390011490196

Subject:Computer Science

Abstract/Summary:

This thesis develops techniques to manage large amounts of multidimensional data inside a database system. To be able to handle multidimensional data efficiently, we need access methods (AMs) to selectively access some data items in a large collection associatively. Commercial databases lag far behind in their support for multidimensional access methods. In this thesis, we design and implement the hybrid tree, a multidimensional index structure that scales to high dimensional spaces. The hybrid tree combines the positive aspects of the two types of multidimensional index structures, namely data partitioning (e.g., R-tree and derivatives) and space partitioning (e.g., kdB-tree and derivatives), to achieve search performance more scalable to high dimensionalities than either of the above techniques. Our experiments show that the hybrid tree scales well to high dimensionalities for real-life datasets.; To achieve further scalability, we develop the local dimensionality reduction (LDR) technique to reduce the dimensionality of high dimensional data. LDR exploits local, as opposed to global, correlations in the data and hence can reduce dimensionality with significantly lower loss of distance information compared to global dimensionality reduction techniques This implies fewer false positives and hence better search performance.; To enable efficient similarity search on time series data, we develop a dimensionality reduction technique, called Adaptive Piecewise Constant Approximation (APCA), for time series data. APCA adapts locally to each time series object in the database and chooses the best reduced-representation for that object. We show how the APCA representation can be indexed using a multidimensional index structure. Our experiments show that APCA outperforms the other techniques by one to two orders of magnitude in terms of search performance.; Before multidimensional index structures can be supported as AMs in “commercial-strength” database systems, efficient techniques to provide transactional access to data via the index structure must be developed. We develop concurrency control techniques for multidimensional index structures.; To handle the huge data volumes and fast response time requirements in decision support applications, we develop an approximate query processing technique based on multidimensional wavelets. Our technique constructs compact synopses (comprising of wavelet coefficients) of the relevant database tables and subsequently answers any SQL query by working exclusively on the compact synopses. Our approach provides more accurate answers and faster response times compared to other approximate query answering techniques.

Keywords/Search Tags:

Multidimensional, Data, Techniques, Large, APCA, Time, Develop

Related items

1	Techniques for approximating optimal linear estimators of multidimensional data
2	The Optimization Research Of Index And Data Organization For The Query Of Large Data Set
3	Research And Implementation Of Multidimensional Time-Series Data Mining Methods
4	Data structures and techniques for visualization of large volumetric carbon dioxide datasets in a real time experience
5	Research On Structure Model From Visualization Design Of Multidimensional Data Interface
6	Distributed multidimensional indexing for scientific data analysis applications
7	Develop And Research Of Business Sale Decision Support System
8	Using game theory techniques and concepts to develop proprietary models for use in intelligent games
9	The use of reaction time, N200 and P300 latency, movement time and accuracy data in localizing the effects of amphetamine and ethanol on stages of information processing and energetical mechanisms: Implications for uni- and multidimensional views of human
10	Research On Multidimensional Data Visualization In Data Mining