Font Size: a A A

Multi-dimensional nearest neighbor searching with low-dimensional data

Posted on:2002-04-28Degree:Ph.DType:Dissertation
University:University of Maryland College ParkCandidate:Maneewongvatana, SongritFull Text:PDF
GTID:1468390011497379Subject:Computer Science
Abstract/Summary:
In nearest neighbor searching we are given a set of points in space, and the goal is to preprocess these points into a data structure so that given any query point, the nearest point can be reported quickly. It has been known that the efficiency of nearest neighbor searching usually degrades as the dimension of the space increases. In high dimensions, this means that using a complicated data structure may not yield a faster search time than brute-force search. But the dimension of the space is not the only factor determining the search efficiency. We believe that the distribution of the data set has significant role on the search efficiency. We focus on data sets that are low-dimensional clustered. We limit our scope to memory resident data structures.; We first study some variants of the kd-tree. These variants arise by changing the splitting rule for the kd-tree. We present an extensive performance study of two new splitting rules: sliding-midpoint and minimum-ambiguity. Both show improvement over the standard splitting method when the data are low-dimensional clustered. We also provide a theoretical analysis to support the improvement in search efficiency when the data points are distributed in a low-dimensional subspace.; A popular method to improve search performance is through approximate nearest neighbor search. However, a drawback of this method is its low predictability of errors. The search is given an allowable error tolerance ε, but the average error committed is often much smaller than ε. We introduce a novel data structure for nearest neighbor search called the os-tree. This data structure is designed in the context of another error model in which the user provides a small failure probability pf, and the search fails to return the true nearest neighbor with probability at most pf. The os-tree is built with the aid of a set of training points. We implemented the os-tree and provide both a theoretical analysis of its search performance and an empirical analysis. We show that the os-tree achieves search efficiencies that are as good as current methods but with significantly better ability to predict errors.
Keywords/Search Tags:Search, Nearest neighbor, Data, Low-dimensional, Os-tree, Points
Related items