Font Size: a A A

Improved query processing and data representation techniques

Posted on:2000-09-04Degree:Ph.DType:Thesis
University:The University of Wisconsin - MadisonCandidate:Goldstein, Jonathan DavidFull Text:PDF
GTID:2468390014464085Subject:Computer Science
Abstract/Summary:
This thesis presents new research on two topics: compression in relational database systems, and nearest neighbor processing techniques.; With the increasing speed of CPUs relative to disks, using compression as a means of improving disk information throughput is becoming very attractive. While it might at first seem reasonable to use traditional compression algorithms such as Lempel-Ziv, these algorithms are unsuitable because they require uncompressing a large portion of the file to retrieve a small piece of data.; Motivated by this observation, we have developed a compression algorithm for fixed length data that overcomes this problem. The algorithm is simple, and can easily be added to the file management layer of a DBMS since it supports the usual technique of identifying a record by a ⟨pageid, slotid ⟩ pair (tuple ID). In addition, this algorithm compresses hyper-rectangle based indexing structures such as R-Trees.; This thesis also describes the relationship between sort orders, multidimensional bulk loading, and compression ratios. Also included in this thesis is an extensive suite of experiments that show the applicability of our compression technique, especially for data warehousing workloads.; In recent years, many researchers have focused on finding efficient solutions to the nearest neighbor problem. While there have been many efforts to find faster than linear scan processing strategies for these tasks, there has been no success at solving the general problem.; While previous work shows that the problem can't be solved efficiently in general, we present a technique for either in memory or secondary storage that is guaranteed to perform well in the “good” situations previously described. Furthermore, it is the first strategy that allows a database administrator to trade off space for execution time. It is also the first nearest neighbor strategy whose behavior is completely understood. Finally, there are variants of the problem that perform far better than alternative techniques on “hard” cases.
Keywords/Search Tags:Technique, Data, Processing, Nearest neighbor, Compression, Problem
Related items