Research Of Approximate K-Nearest Neighbors Search Algorithm Based On Locality Sensitive Hashing

Posted on:2015-09-27

Degree:Master

Type:Thesis

Country:China

Candidate:W M Qiu

Full Text:PDF

GTID:2298330467986840

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

In recent years, with the continuous and fast development of network information retrieval technology, especially many applications witness a quick increase in the amount and dimension of data to be processed. How to efficiently deal with large amounts of high-dimensional data search problem becomes an important research topic. Due to the "curse of dimensionality", many tree-based index structure and its variants index methods haven’t meet the requirements of users, these tree-based methods become slower than the brute-force approach.LSH (Locality Sensitive Hashing) is the most popular and suitable for approximate similarity search in high-dimensional data space. However, a significant drawback of this approach is the requirement for many hash tables in order to promise good search quality, resulting in high space cost and low time efficiency. Moreover, as the amount of data scale is gradually increasing and MapReduce is gradually widely used, traditional centralized methods haven’t the ability to deal with massive data, in order to solve these problems, the following work has been done in this paper:(1) To overcome the problem of high space cost and low search efficiency, we propose a new two-level hybrid schema, called LSRP-tree, which firstly divides the dataset into many subgroups with a RP-tree structure, and then construct LSH hash tables for each subgroup. Based on the two-level schema and fully exploit the feature of hash function collision, we propose two different approximate similarity search algorithms, separately called CCP and CCF, which efficiently perform approximate k-Nearest Neighbors in high dimensional space. Compared with LSB-tree/LSB-forest method, our methods show better performance than baseline methods in terms of search efficiency, search quality and space cost.(2) In order to solve the problem of insufficient ability to deal with massive data, this paper studies the LSH-based k-Nearest Neighbors search algorithm with the popular mapreduce programming model. We propose a novel LSH-based distributed inverted index scheme and design an efficient search algorithm, called H-c2kNN. At last, we implement our methods and conduct many experiments on real and synthetic data set, the results show that our proposed approach gains good performance and high scalability.

Keywords/Search Tags:

Locality sensitive hashing, k-Nearest Neighbors, high dimensional data

PDF Full Text Request

Related items

1	Locality Sensitive Hashing Index Based On Neighborhood Collision Counting
2	Hash-based Approximate Nearest Neighbor Search For High-dimensional Data
3	Research On Integrated Algorithm Of Locality Sensitive Hashing And Matrix Factorization On GPU Platform
4	Study On The Efficient Approximate Nearest Neighbor Search For Massive Data
5	Algorithms for high dimensional data
6	Research On Similarity Image Retrieval Based On Locality Sensitive Hashing And Structured P2P Network
7	Research Of Approximate Nearest Neighbor Search Based On Locality Sensitive Hashing
8	Research On Local Sensitive Hashing And Approximate Nearest Neighbor Algorithm
9	Reasearch On Locality Sensitive Hashing Based Approximate Nearest Neighbor(s) Searching Algorithm
10	Research On High-dimensional Index In Large-scale Image Retrieval