Research On Hashing Methods Towards Large-scale Similarity Computation And Search

Posted on:2016-04-22

Degree:Doctor

Type:Dissertation

Country:China

Candidate:J Q Ji

Full Text:PDF

GTID:1108330503456152

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Data explodes as the Internet grows. E?ectively computing the pairwise similarity and conducting similarity search in the context of large-scale data is a fundamental yet challenging problem which has a wide range of applications. Locality-sensitive hashing is an e?ective tool towards this problem. Locality-sensitive hashing methods compress data into compact hash codes. By computing simple distance(e.g. Hamming distance)between hash codes we can e?ciently estimate the pairwise similarity and distance between original data. And thus this kind of similarity-preserving hash codes can serve as compressed representations of original data, which can be used as inputs to various machine learning or data mining algorithms based on pairwise similarity, leading to acceleration and storage reduction. Besides, hash codes can be indexed in a hash table to enable fast similarity search. There are various kinds of locality-sensitive hashing methods for di?erent kinds of data and similarity. The existing locality-sensitive hashing methods su?er from their large estimation variance, and thus it takes many such hash functions to achieve desired level of precision, leading to expensive computation and storage cost. In this paper, We first study locality-sensitive hashing methods for simple-structured data,e.g. vectors and sets, and propose to sample hashing functions in a structured manner,to reduce the estimation variance or hashing time. On the other hand, previous studies mainly focus on hashing methods for simple-structured data, while many kinds of data in real applications cannot be represented by simple structures, such as subspace and graph.We propose a hashing method for subspaces, and an extension to 3rd-order tensors. More specifically, we make the following contributions:1. For vectors, we propose batch-orthogonal locality-sensitive hashing(BOLSH),which is an improvement over sign-random-projection locality-sensitive hashing(SRPLSH). BOLSH uses hashing functions sampled in a structured manner, i.e., hashing functions constructed by batch-orthogonal random projection vectors. We provide theoretical analysis to show that BOLSH can provide unbiased estimate of the angles between vectors, with smaller variance compared to SRP-LSH. Our experiments verifies the theoretical results.2. For sets, based on min-hashing method, we propose min-max hashing method,again using hashing functions sampled in a structured manner, i.e., min functions and max functions. We prove that min-max hashing reduces the hashing time by half, and provides unbiased estimate of Jaccard similarity with smaller variance. Our experiments verifies the theoretical results.3. We propose an angular-similarity-preserving locality-sensitive hashing method for subspaces and 3rd-order tensors, and provide theoretical analysis and experiment verification. We also propose fast sign-random-projection method and bilinear random projection method to reduce the hashing time.

Keywords/Search Tags:

locality-sensitive hashing, Hamming distance, similarity search

PDF Full Text Request

Related items

1	Research On Similarity Image Retrieval Based On Locality Sensitive Hashing And Structured P2P Network
2	Research On Locality Sensitive Hashing-Based Similarity Search
3	Dynamic Similarity Search Over Encrypted Data
4	Locality Sensitive Hashing Algorithm Based On P-stable Distribution And Space Sphere Lattice
5	Working Towards Performance Analysis Of Locality Sensitive Hashing
6	Approximate Similarity Search In High-Dimensional Euclidean Space
7	Secure Similarity Search Over Encrypted High-Dimension Data
8	Research Of Approximate K-Nearest Neighbors Search Algorithm Based On Locality Sensitive Hashing
9	Research Of Approximate Nearest Neighbor Search Based On Locality Sensitive Hashing
10	Research On Integrated Algorithm Of Locality Sensitive Hashing And Matrix Factorization On GPU Platform