Study On The Efficient Approximate Nearest Neighbor Search For Massive Data

Posted on:2017-10-25

Degree:Master

Type:Thesis

Country:China

Candidate:L Zhou

Full Text:PDF

GTID:2348330512468174

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

With the arrival of the era of big data,many massive data have been unable to be managed,analyzed and processed within the time allowed by conventional software tools.On the other hand,the larger the data,the greater the potential value of it.How to quickly and efficiently retrieve similar data in big data is one hot topic of computer research.To overcome this problem,researchers have proposed approximate nearest neighbor search method,where the index is used to speed up data retrieval.Two well-known and effective methods of researching this issue are locality sensitive hashing and random grid.However,both methods have some problems.To ensure accuracy,the locality sensitive hashing algorithm needs to create more than one index table,resulting in high space complexity.Similarly,in order to ensure accuracy,random grid method needs to create a lot of copies,which will cost more external memory space.In order to solve the problem of too much space consumption and guarantee the accuracy in approximate neighbors,this paper proposes a new solution,namely locality sensitive hashing algorithm,based on grid.Algorithm in this paper absorbs the thought of grid division from random grid method and divides the data based on grid.This method will not change the spatial neighbor structure of the data.At the same time,it utilizes the advantage of locality sensitive hashing in dealing with high-dimensional data to search the grid and retrieve the approximate neighbor data via index.This method can effectively reduce the amount of data transmission.And since the method divides data into many grids,it greatly ensures the retrieval precision.We implement the algorithm in this paper under the framework of MapReduce.And the algorithm has good scalability.Experimental results show that the algorithm proposed in this paper has good performance when processing the enormous data in high-dimensional space.

Keywords/Search Tags:

Big Data, MapReduce, Approximate Nearest Neighbors, Locality Sensitive Hashing, Grid Index

PDF Full Text Request

Related items

1	Research Of Approximate K-Nearest Neighbors Search Algorithm Based On Locality Sensitive Hashing
2	Locality Sensitive Hashing Index Based On Neighborhood Collision Counting
3	Hash-based Approximate Nearest Neighbor Search For High-dimensional Data
4	Research On Local Sensitive Hashing And Approximate Nearest Neighbor Algorithm
5	Research Of Approximate Nearest Neighbor Search Based On Locality Sensitive Hashing
6	LSH-based Similarity Search Of Web Data
7	Reasearch On Locality Sensitive Hashing Based Approximate Nearest Neighbor(s) Searching Algorithm
8	Research Of The Efficient Index Structure For Approximate Nearest Neighbor Algorithm
9	Research On Integrated Algorithm Of Locality Sensitive Hashing And Matrix Factorization On GPU Platform
10	Efficient Algorithms For Approximate Aggregation And Nearest Neighbor Queries Over Multi-Dimensional Data