Reasearch On Locality Sensitive Hashing Based Approximate Nearest Neighbor(s) Searching Algorithm

Posted on:2016-09-23

Degree:Master

Type:Thesis

Country:China

Candidate:L Zhang

Full Text:PDF

GTID:2308330473465339

Subject:Communication and Information System

Abstract/Summary:

PDF Full Text Request

With the development and popularity of Internet and computer technologies, Big Data era has come. Similarity search is frequently used to aquire information, and has become even more important requirement. Big Data brings a wealth of information and high-level experience to people, but poses serious chanlleges to processing data at the same time. Tradional retrival methods based on spatial and data partition are no longer effective due to the curse of dimensionality. To solve this problem, approximate similarity search is provided, to decrease query time at the cost of slightly lower accuracy.LSH(Locality Sensitive Hashing), as one of the state-of-the-art approximate similarity search technologies, has increasingly become a hot spot. Its basic idea can be summarized as: firstly filtering data items with low possibility of being similar to query item, then calculate the exact similarity between candidate items and query item, which significantly decrease computational complexity and thus can return query result much faster than traditional methods. Algorithms based on LSH use a large quantity of hash tables to guarantee the precision of calculation with high possibility, which leads to great storage burden.This thesis proposes a Multi-probe based distributed LSH approximate nearest neighbor(s) searching method, and provides a Hamming Distance based LSH approximate nearest neighbor(s) searching algorithm. The distributed method constrcuts distributed index on Chord structure with Multi-probe technology, which obviously decreases required number of hash buckets under the condition of ensuring the recall; the centralized algorithm generates sub-fingerprints by randomly extracting from the fingerprint and construct index tables, at the same time, weights are assigned to data items by comparing the hamming distance between the short fingerprints of query item and those of data items to diminish the size of candidate set. In all, focusing on different secnarios, the above two methods decreases the space required for storing hash tables, meanwhile improving the performance of nearest neighbor(s) searching effectively.

Keywords/Search Tags:

Approximate Nearest Neighbor(s) Searching, LSH, Locality Sensitive Hashing, Distributed Search

PDF Full Text Request

Related items

1	Research Of Approximate Nearest Neighbor Search Based On Locality Sensitive Hashing
2	Locality Sensitive Hashing Index Based On Neighborhood Collision Counting
3	Hash-based Approximate Nearest Neighbor Search For High-dimensional Data
4	Research On Local Sensitive Hashing And Approximate Nearest Neighbor Algorithm
5	Research Of Algorithm For Crowd Abnormal Behavior Detection Based On The Technology Of Approximate Nearest Neighbor Search
6	Locality Sensitive Hashing Algorithm Based On P-stable Distribution And Space Sphere Lattice
7	Approximate Nearest Neighbor Search For High-Dimensional Based On Nearest Neighbor Graph
8	Efficient Algorithms For Approximate Aggregation And Nearest Neighbor Queries Over Multi-Dimensional Data
9	Study On The Efficient Approximate Nearest Neighbor Search For Massive Data
10	Study On Approximate Nearest Neighbor Search Over Encrypted Data In Cloud Computing