Font Size: a A A

Data transformation for improved query performance

Posted on:2013-05-08Degree:Ph.DType:Dissertation
University:Michigan State UniversityCandidate:Watve, AlokFull Text:PDF
GTID:1458390008983778Subject:Computer Science
Abstract/Summary:
A database management system stores data in order to facilitate fast and efficient retrieval while executing queries. The typical queries that are run in a database can be classified into three broad categories. They are range queries, k-nearest neighbor (k-NN) queries and box queries. Implementation of a box query typically involves simple comparison of values in each dimension of the query feature vector with the corresponding dimension of an object in the database. Implementing range queries and k-NN queries, however, may be more involved due to computation of distances among the points. In this dissertation, we study mapping of one type of the query on to the other. From performance perspective, an index structure may favor one type of query over other. Hence, such a mapping provides a way of improving query performance and exploiting available system capabilities. It also highlights the relationships among the various types of queries.;Our first transformation maps a range query in L 1 space on to a box query. Since index pages of R-tree based indexing schemes are geometrically rectangles, this mapping provides a similar interface between the query space and the data space of each of the index page. In 2-dimensional space, the mapping is exact. However, it cannot be used directly for higher dimensional spaces. We propose a novel approach called disjoint planar rotation in order to extend the transformation in higher dimensions. We also develop a new type of box query, called pruning box query, which is equivalent to the range query in the original space. Our theoretical analysis shows that this mapping can improve I/O performance of the queries. Further, performance improvement increases with increasing number of dimensions. Due to the underlying similarity between range queries and k-NN queries, the proposed transformation can also be used to improve performance of k-NN queries. We also present a transformation to map box queries on to range queries in L1 space. The inherent property of box queries to allow varying degree of selectivity along each dimension, poses some challenges for the transformation. We propose square tiling approach to map each box query on to a number of square box queries. Each of the square box queries can then be transformed into a range query. We demonstrate the effectiveness of this mapping using M-Tree.;Euclidean distance (or L2 norm) is another popular distance measure. While exact mapping of range queries in Euclidean space to box queries may be challenging, using vantage point based indexing, it is possible to transform range queries into bounding box queries. Since, execution of bounding box queries is computationally more efficient than that of range queries, the transformation from range queries to bounding box queries can be used to improve query performance (i.e. the number of distance computations) in main memory databases. Existing work on vantage point based indexing uses data points from the database as vantage points. As a result the database becomes static and cannot allow dynamic insertions, deletions and updates. Further, Computational complexity of these vantage point selection schemes depends on the size of the database which can be a problem for large databases. We analyze the impact of vantage points on false positives and the number of duplicates; and present a heuristic algorithm for selecting vantage points in closed data spaces. As the vantage point selection is independent of data, the database allows dynamic insertions and deletions. Comprehensive experimental evaluation with several synthetic and real databases shows effectiveness of the proposed vantage point selection scheme.
Keywords/Search Tags:Data, Queries, Query, Vantage point selection, Transformation, Performance, Improve
Related items