A Study Of Nearest Neighbor Search In High-dimentional Space

Posted on:2015-11-08

Degree:Master

Type:Thesis

Country:China

Candidate:Y Yuan

Full Text:PDF

GTID:2308330464468926

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

The Nearest Neighbor Search(NN Search)in higher-dimension data space is widely applied in database, image processing and many other relevant areas. Faced with the Curse of Dimensionality problem which becomes more and more serious we investigate and implement the DCR(Data Co-Reduction) Algorithm. By utilizing currently developed co-clustering method, DCR reduces both size and dimensionality of the original data into a compact subspace. Index and KNN Search use the Filter And Refine strategy. We establish the tight distance lower-bound of the subspace yielded by co-clustering. Consequently, we increase the efficiency of the KNN search. Especially, DCR considers the duality between size and dimensionality to optimize similarity search. The experiment results on the large-scale data set indicate that DCR owns excellent performance of pruning power and response time.Recently, i Distance and its variants are acknowledged as the most promising solutions to the KNN search. However, i Distance approaches suffer from a drawback that the access to candidate objects requires a large number of random I/O operations. In order to guarantee the quality of returned results, sufficient objects should be verified, which would consume enormous I/O cost. To address this issue, we propose a novel method, namely ZOrder Index, which minimizes the number of page accesses through maximizing the number of candidate objects in each single page. The key contributions can be summarized as follows. Firstly, we define a measure to evaluate the distance between the ZOrder Code of two points,according to which the orders of points to be verified are determined. Hence, a linear order relationship on the set of ZOrder Code is created, and the set of ZOrder Code as well as data points can be sorted. Secondly, we prove that points with similar ZOrder Code are stored in consecutive disk pages. During the NN search, only several disk pages are necessary to be accessed for verification. With several pages loaded into the main memory, enough candidate objects are verified, which not only significantly reduces the response time but also improves the accuracy of the returned results. Our exhaustive empirical study over several real-world data sets demonstrates the superior efficiency and accuracy of ZOrdering Index for the NN search, compared with state-of-the-art methods.

Keywords/Search Tags:

indexing, search quantification and coding, data co-reduction, filter and refine, high-dimensional

PDF Full Text Request

Related items

1	Search and indexing of high-dimensional feature spaces for similarity retrieval
2	Research And Application Of High Efficient Attribute Reduction For High Dimensional Data Based On Rough Sets
3	Research Of Indexing Technology In Medical Image Retrieval Platform
4	High-dimensional Data Set Indexing Technology
5	Learning And Indexing Structural Representations Of Large Scale High Dimensional Data
6	Research On Dimensionality Reduction And Indexing Algorithm Of Multimedia Database And System Implementation
7	Research Of High Dimensional Retrival Algorithms Based On Adaptive Cluster Distance Bounding
8	Research On K-nearest Neighbor Search Algorithm In High Dimensional Space
9	Hash-based Approximate Nearest Neighbor Search For High-dimensional Data
10	Research On And Design Of Dimensionality Reduction Algorithm For The High Dimensional Data