Font Size: a A A

NAQ-tree: Effective index structure for similarity search in high dimensional space

Posted on:2011-08-05Degree:Ph.DType:Thesis
University:University of Calgary (Canada)Candidate:Zhang, MingFull Text:PDF
GTID:2448390002964754Subject:Computer Science
Abstract/Summary:
Similarity search in high-dimensional metric space is the key operation in many applications, such as multi-media databases, content-based image retrieval, object recognition, molecular biology, medical imaging, security, wildlife, and critical health cases, among others. The high dimensionality and the huge size of the data set require an index structure to facilitate the search. State-of-the-art index structures are mostly built by partitioning the data set based on distances to certain reference point(s). Using the index, the search is confined to a small number of partitions. Based on careful investigation and analysis of the existing indexing techniques described in the literature, this thesis proposes a new index structure, called NAQ-tree (Nested-Approximate-eQuivalence-class tree), which overcomes the disadvantages of the investigated index structures. NAQ-tree is constructed by recursively dividing the data set into nested approximate equivalence classes. NAQ-tree is effective in serving variety of queries, including k-nearest neighbor, approximate k-nearest neighbor and reverse k-nearest neighbor queries.;While other researchers separate the filtering and verification stages of the process used to locate reverse nearest neighbors, we integrate the filtering and verification steps and use the information obtained in the verification stage to further improve the filtering rate. Our approach delivers results earlier and hence well serves real-time applications. Actually, incremental reporting of the result is at least equally important for multi-step nearest neighbor search to reduce the number of expensive distance computations required to complete the search and to better serve applications where the distance is known at run-time. Algorithms described in the literature separate multi-step nearest neighbor search into filtering and refinement stages with the result delivered by the end of the refinement stage. We extended an existing algorithm in a way to produce the nearest neighbors incrementally in an optimal way in the sense that a nearest neighbor is output as soon as it can be determined using the existing information; thus, nearest neighbors are produced in order. The conducted analysis and the reported comparative test results demonstrate the effectiveness of NAQ-tree in significantly improving the search efficiency.;Index Terms: high dimensionality, dimensionality reduction, indexing, similarity search, online monitoring, content-based image retrieval, k-nearest neighbor search, partitioning, reverse nearest neighbor search.;Approximate k-nearest neighbor search is essential for time critical applications which are mostly characterized by the need to search huge datasets; we reduce the dataset size by clustering, and then we use one representative from each cluster to build indexes using a set of NAQ-trees residing on multiple nodes of a parallel machine. In the query processing phase, the built indexes are queried in parallel and from each tree only a very small number of index nodes are searched to report the partial results which are combined into the final result. As reverse k-nearest neighbor queries are concerned, the properties of NAQ-tree provide strong pruning rules.
Keywords/Search Tags:Search, Naq-tree, K-nearest neighbor, Index, Applications, Reverse
Related items