Fast Locality Sensitive Hashing Algorithm for Approximate Nearest Neighbor Search: A Practical Data Mining Approach

Posted on:2013-02-13

Degree:Ph.D

Type:Dissertation

University:North Carolina Agricultural and Technical State University

Candidate:Buaba, Ruben

Full Text:PDF

GTID:1458390008970780

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Locality Sensitive Hashing (LSH) is an index-based data structure that allows spatial item retrieval over a large dataset. The performance of LSH depends on three main parameters: a performance measure, denoted by ρ; the number of hash tables, denoted by L and the number of hash functions, denoted by k. These parameters influence the computational, memory space and query time complexities of the data structure. The minimization of ρ at a specific approximation factor c, is dependent on the load factor, α. Over the years, α=4 has been used by researchers as the optimal load factor that gives the minimum value for ρ. In addition, researchers use a one-degree-of-freedom design in which k is chosen and used to compute L. In this research, it is shown that the choice of α=4 does not give minimum value for ρ. To guarantee low computational, memory space and query time complexities, an optimal load factor of α=5 is proposed. In addition, a new LSH algorithm is proposed in which the L is chosen and used to compute k. To bridge the semantic gab between humans and computers, a Euclicosine similarity metric is proposed to rank retrieved items. Experiments on the Defense Meteorological Satellite Program imagery dataset have shown that the proposed techniques cut the computational and memory space complexities by 50% each and improve the query time complexity by a factor of 2.

Keywords/Search Tags:

Data, Query time, Memory space, LSH, Factor

PDF Full Text Request

Related items

1	Research On Data Discretization And Classification Algorithm Based On Factor Space Theory
2	Research On Incremental Factor Reduction Algorithm Based On Factor Space Theory
3	Research Of High-Dimensional Space Query Algorithm Based On Space-Filling Curves
4	Real-time System Of Memory Management Technology And Implementation
5	Factor Space Vine Extraction In Factor Analysis Table
6	The Method Of Data Discretization Based On Factor Space Theory And It Combine With The Set Subtraction And Rotation Calculation
7	Research On Parallel Query For In-memory Database
8	Design Of The Differential Unitary Space-Time Block Codes And Applied In MIMO-OFDM
9	Reducing time and space costs of memory tracing
10	Research Of Manycore And On-Chip Memory Integration Based On 2.5D Technology