Font Size: a A A

A Study Of Nearest Neighbor Search In High-dimentional Space

Posted on:2015-11-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y YuanFull Text:PDF
GTID:2308330464468926Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The Nearest Neighbor Search(NN Search)in higher-dimension data space is widely applied in database, image processing and many other relevant areas. Faced with the Curse of Dimensionality problem which becomes more and more serious we investigate and implement the DCR(Data Co-Reduction) Algorithm. By utilizing currently developed co-clustering method, DCR reduces both size and dimensionality of the original data into a compact subspace. Index and KNN Search use the Filter And Refine strategy. We establish the tight distance lower-bound of the subspace yielded by co-clustering. Consequently, we increase the efficiency of the KNN search. Especially, DCR considers the duality between size and dimensionality to optimize similarity search. The experiment results on the large-scale data set indicate that DCR owns excellent performance of pruning power and response time.Recently, i Distance and its variants are acknowledged as the most promising solutions to the KNN search. However, i Distance approaches suffer from a drawback that the access to candidate objects requires a large number of random I/O operations. In order to guarantee the quality of returned results, sufficient objects should be verified, which would consume enormous I/O cost. To address this issue, we propose a novel method, namely ZOrder Index, which minimizes the number of page accesses through maximizing the number of candidate objects in each single page. The key contributions can be summarized as follows. Firstly, we define a measure to evaluate the distance between the ZOrder Code of two points,according to which the orders of points to be verified are determined. Hence, a linear order relationship on the set of ZOrder Code is created, and the set of ZOrder Code as well as data points can be sorted. Secondly, we prove that points with similar ZOrder Code are stored in consecutive disk pages. During the NN search, only several disk pages are necessary to be accessed for verification. With several pages loaded into the main memory, enough candidate objects are verified, which not only significantly reduces the response time but also improves the accuracy of the returned results. Our exhaustive empirical study over several real-world data sets demonstrates the superior efficiency and accuracy of ZOrdering Index for the NN search, compared with state-of-the-art methods.
Keywords/Search Tags:indexing, search quantification and coding, data co-reduction, filter and refine, high-dimensional
PDF Full Text Request
Related items