Font Size: a A A

Applied spatial data structures for large data sets

Posted on:2004-12-01Degree:Ph.DType:Dissertation
University:The Johns Hopkins UniversityCandidate:Chaudhary, AmitabhFull Text:PDF
GTID:1468390011466220Subject:Computer Science
Abstract/Summary:
Spatial data structures for large data sets need to be particularly efficient. We introduce new data structures for two problems that involve large data sets.; We introduce Parameterized Balanced Aspect Ratio (PBAR) trees. They are instances of Binary Space Partition trees and to partition the space they accept any three partitioning vectors. All partitions are made with hyperplanes orthogonal to these vectors. Given a set of n points in 2-dimensional space, a PBAR tree can be constructed in O(n log n) time such that the depth of the tree is O (log n) and the aspect ratio of each region is bounded by a constant that only depends upon the angles between the partitioning vectors. Like previously introduced BAR trees, that too have bounded aspect ratio, PBAR trees have good upper bounds on the time taken to solve a number of approximate spatial queries like the (1 + ε)-nearest neighbor query and the ε-range query. However, since the bounds on their aspect ratio are better than those for BAR trees, they are expected to perform better at solving these queries. We present empirical results of tests with the (1 + ε)-nearest neighbor query that show PBAR trees outperforming BAR trees.; We also introduce a special form of k-d trees that are very fast in detecting outliers from large data sets in multidimensional spaces. Outliers are data objects that do not comply with the general behavior of the data. Our scheme does not assume any probability model and is linear in the number of objects and in the number of dimensions. It also does not assume the existence of a full dimensional distance metric in the input space. It is suitable for many applications including detecting outliers in astronomical databases—for which none of the existing schemes work. We present empirical results that show our scheme is, in fact, a practical solution.
Keywords/Search Tags:Large data sets, BAR trees, Aspect ratio, PBAR
Related items