Improved query processing and data representation techniques

Posted on:2000-09-04

Degree:Ph.D

Type:Thesis

University:The University of Wisconsin - Madison

Candidate:Goldstein, Jonathan David

Full Text:PDF

GTID:2468390014464085

Subject:Computer Science

Abstract/Summary:

This thesis presents new research on two topics: compression in relational database systems, and nearest neighbor processing techniques.; With the increasing speed of CPUs relative to disks, using compression as a means of improving disk information throughput is becoming very attractive. While it might at first seem reasonable to use traditional compression algorithms such as Lempel-Ziv, these algorithms are unsuitable because they require uncompressing a large portion of the file to retrieve a small piece of data.; Motivated by this observation, we have developed a compression algorithm for fixed length data that overcomes this problem. The algorithm is simple, and can easily be added to the file management layer of a DBMS since it supports the usual technique of identifying a record by a ⟨pageid, slotid ⟩ pair (tuple ID). In addition, this algorithm compresses hyper-rectangle based indexing structures such as R-Trees.; This thesis also describes the relationship between sort orders, multidimensional bulk loading, and compression ratios. Also included in this thesis is an extensive suite of experiments that show the applicability of our compression technique, especially for data warehousing workloads.; In recent years, many researchers have focused on finding efficient solutions to the nearest neighbor problem. While there have been many efforts to find faster than linear scan processing strategies for these tasks, there has been no success at solving the general problem.; While previous work shows that the problem can't be solved efficiently in general, we present a technique for either in memory or secondary storage that is guaranteed to perform well in the “good” situations previously described. Furthermore, it is the first strategy that allows a database administrator to trade off space for execution time. It is also the first nearest neighbor strategy whose behavior is completely understood. Finally, there are variants of the problem that perform far better than alternative techniques on “hard” cases.

Keywords/Search Tags:

Technique, Data, Processing, Nearest neighbor, Compression, Problem

Related items

1	Study On Generalized Nearest Neighbor Pattern Classification
2	Research On Optimal-Nearest-Neighbor And Reverse Visible Nearest Neighbor Queries
3	Research On The Key Technologies And The Applications For The Class Of Imbalance Problem
4	Research Of Local Sensitive Hash Index Based On Nearest Neighbor Graph
5	Research On Near Neighbor Query Of Hybrid Data In Spatial Database
6	Design And Implementation Of Uncertain Data Neighbor Search Algorithm
7	Studies On Probabilistic Nearest Neighbor Queries Over Uncertain Data
8	Classification Of Uncertain Data Based On Nearest Neighbor
9	Nearest Neighbor Query Technique Of Spatio-temporal Road Network
10	The Research Of K-nearest Neighbor Query Of Spatial Data Based On Voronoi Diagram