Font Size: a A A

Fast Locality Sensitive Hashing Algorithm for Approximate Nearest Neighbor Search: A Practical Data Mining Approach

Posted on:2013-02-13Degree:Ph.DType:Dissertation
University:North Carolina Agricultural and Technical State UniversityCandidate:Buaba, RubenFull Text:PDF
GTID:1458390008970780Subject:Engineering
Abstract/Summary:
Locality Sensitive Hashing (LSH) is an index-based data structure that allows spatial item retrieval over a large dataset. The performance of LSH depends on three main parameters: a performance measure, denoted by ρ; the number of hash tables, denoted by L and the number of hash functions, denoted by k. These parameters influence the computational, memory space and query time complexities of the data structure. The minimization of ρ at a specific approximation factor c, is dependent on the load factor, α. Over the years, α=4 has been used by researchers as the optimal load factor that gives the minimum value for ρ. In addition, researchers use a one-degree-of-freedom design in which k is chosen and used to compute L. In this research, it is shown that the choice of α=4 does not give minimum value for ρ. To guarantee low computational, memory space and query time complexities, an optimal load factor of α=5 is proposed. In addition, a new LSH algorithm is proposed in which the L is chosen and used to compute k. To bridge the semantic gab between humans and computers, a Euclicosine similarity metric is proposed to rank retrieved items. Experiments on the Defense Meteorological Satellite Program imagery dataset have shown that the proposed techniques cut the computational and memory space complexities by 50% each and improve the query time complexity by a factor of 2.
Keywords/Search Tags:Data, Query time, Memory space, LSH, Factor
Related items