Multi-dimensional nearest neighbor searching with low-dimensional data

Posted on:2002-04-28

Degree:Ph.D

Type:Dissertation

University:University of Maryland College Park

Candidate:Maneewongvatana, Songrit

Full Text:PDF

GTID:1468390011497379

Subject:Computer Science

Abstract/Summary:

In nearest neighbor searching we are given a set of points in space, and the goal is to preprocess these points into a data structure so that given any query point, the nearest point can be reported quickly. It has been known that the efficiency of nearest neighbor searching usually degrades as the dimension of the space increases. In high dimensions, this means that using a complicated data structure may not yield a faster search time than brute-force search. But the dimension of the space is not the only factor determining the search efficiency. We believe that the distribution of the data set has significant role on the search efficiency. We focus on data sets that are low-dimensional clustered. We limit our scope to memory resident data structures.; We first study some variants of the kd-tree. These variants arise by changing the splitting rule for the kd-tree. We present an extensive performance study of two new splitting rules: sliding-midpoint and minimum-ambiguity. Both show improvement over the standard splitting method when the data are low-dimensional clustered. We also provide a theoretical analysis to support the improvement in search efficiency when the data points are distributed in a low-dimensional subspace.; A popular method to improve search performance is through approximate nearest neighbor search. However, a drawback of this method is its low predictability of errors. The search is given an allowable error tolerance ε, but the average error committed is often much smaller than ε. We introduce a novel data structure for nearest neighbor search called the os-tree. This data structure is designed in the context of another error model in which the user provides a small failure probability p_f, and the search fails to return the true nearest neighbor with probability at most p_f. The os-tree is built with the aid of a set of training points. We implemented the os-tree and provide both a theoretical analysis of its search performance and an empirical analysis. We show that the os-tree achieves search efficiencies that are as good as current methods but with significantly better ability to predict errors.

Keywords/Search Tags:

Search, Nearest neighbor, Data, Low-dimensional, Os-tree, Points

Related items

1	Multiple Hash Tables Indexing And Optimization For Approximate Nearest Neighbor Search
2	Research On K-nearest Neighbor Search Algorithm In High Dimensional Space
3	Research On Hashing Accelerated Approximate Nearest-Neighbor Search
4	Approximate Nearest Neighbor Search For High-Dimensional Based On Nearest Neighbor Graph
5	Hash-based Approximate Nearest Neighbor Search For High-dimensional Data
6	NAQ-tree: Effective index structure for similarity search in high dimensional space
7	Similarity Search On Large-scale High-dimensional Data
8	Research On Approximate Nearest Neighbor Search And Maximum Inner Product Search For High-dimensional Dataset
9	Research On K Nearest Neighbor In High Dimension Data
10	Locality Sensitive Hashing Index Based On Neighborhood Collision Counting