Font Size: a A A

Techniques for efficient k-nearest neighbor searching in non-ordered discrete and hybrid data spaces

Posted on:2011-12-30Degree:Ph.DType:Dissertation
University:Michigan State UniversityCandidate:Kolbe, Dashiell MatthewsFull Text:PDF
GTID:1448390002950002Subject:Engineering
Abstract/Summary:
Similarity searches/queries in Non-Ordered Discrete Data, Spaces (NDDS) and Hybrid Data Spaces (HDS) are becoming increasingly useful in areas such as bioinformatics multimedia text retrieval, audio and video compression, data-mining, and E-commerce. The objective of this dissertation is to develop and analyze novel methods to support similarity searches in such spaces.;In this dissertation, we first discuss performing k-Nearest Neighbor (k-NN) searches in NDDSs. Performing such searches in NDDSs raises new challenges. Due to the coarse granularity of the commonly used Hamming distance measure, a nearest neighbor query in an NDDS may lead to a large set of candidate solutions, creating a high degree of non-determinism. We propose a new distance measure that reduces the number of candidate solutions for a query while preserving the essential properties of Hamming distance. We have also implemented nearest neighbor queries using multidimensional database indexing in NDDSs. We use the properties of our multidimensional NDDS index to derive the probability of encountering new neighbors within specific regions of the index. This probability is used to develop a new search ordering heuristic. Our experiments demonstrate that our nearest neighbor algorithm is efficient in finding nearest neighbors in NDDSs and that our heuristics are effective in improving the performance of such queries.;We then discuss our work on providing a generalization of our GEH distance. This generalized form allows our distance measure to be applied to a broad range of applications. Of these, we discuss a new rank based implementation well suited to applications with heavily skewed data distributions. Our experiments demonstrate the benefits of an adaptable distance metric by presenting scenarios that demonstrate performance changes depending upon the distance measure used.;Finally, we discuss extending k-NN searching to HDS. We consider the challenges of exploiting both the CDS and NDDS properties of HDS for optimizing potential search algorithms. In particular we consider how key search information is maintained in HDS data structures and what rules must be observed to guarantee the correctness of search results in such spaces. Further, the concept of search execution stages is introduced to develop efficient k-NN search algorithms for HDS. Lastly, a theoretical performance model is developed for HDS searching to validate our experimental results.
Keywords/Search Tags:Search, HDS, Data, Spaces, Nearest neighbor, NDDS, Efficient, Distance measure
Related items