Font Size: a A A

Reasearch On Locality Sensitive Hashing Based Approximate Nearest Neighbor(s) Searching Algorithm

Posted on:2016-09-23Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:2308330473465339Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the development and popularity of Internet and computer technologies, Big Data era has come. Similarity search is frequently used to aquire information, and has become even more important requirement. Big Data brings a wealth of information and high-level experience to people, but poses serious chanlleges to processing data at the same time. Tradional retrival methods based on spatial and data partition are no longer effective due to the curse of dimensionality. To solve this problem, approximate similarity search is provided, to decrease query time at the cost of slightly lower accuracy.LSH(Locality Sensitive Hashing), as one of the state-of-the-art approximate similarity search technologies, has increasingly become a hot spot. Its basic idea can be summarized as: firstly filtering data items with low possibility of being similar to query item, then calculate the exact similarity between candidate items and query item, which significantly decrease computational complexity and thus can return query result much faster than traditional methods. Algorithms based on LSH use a large quantity of hash tables to guarantee the precision of calculation with high possibility, which leads to great storage burden.This thesis proposes a Multi-probe based distributed LSH approximate nearest neighbor(s) searching method, and provides a Hamming Distance based LSH approximate nearest neighbor(s) searching algorithm. The distributed method constrcuts distributed index on Chord structure with Multi-probe technology, which obviously decreases required number of hash buckets under the condition of ensuring the recall; the centralized algorithm generates sub-fingerprints by randomly extracting from the fingerprint and construct index tables, at the same time, weights are assigned to data items by comparing the hamming distance between the short fingerprints of query item and those of data items to diminish the size of candidate set. In all, focusing on different secnarios, the above two methods decreases the space required for storing hash tables, meanwhile improving the performance of nearest neighbor(s) searching effectively.
Keywords/Search Tags:Approximate Nearest Neighbor(s) Searching, LSH, Locality Sensitive Hashing, Distributed Search
PDF Full Text Request
Related items