Font Size: a A A

Study On The Efficient Approximate Nearest Neighbor Search For Massive Data

Posted on:2017-10-25Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhouFull Text:PDF
GTID:2348330512468174Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the arrival of the era of big data,many massive data have been unable to be managed,analyzed and processed within the time allowed by conventional software tools.On the other hand,the larger the data,the greater the potential value of it.How to quickly and efficiently retrieve similar data in big data is one hot topic of computer research.To overcome this problem,researchers have proposed approximate nearest neighbor search method,where the index is used to speed up data retrieval.Two well-known and effective methods of researching this issue are locality sensitive hashing and random grid.However,both methods have some problems.To ensure accuracy,the locality sensitive hashing algorithm needs to create more than one index table,resulting in high space complexity.Similarly,in order to ensure accuracy,random grid method needs to create a lot of copies,which will cost more external memory space.In order to solve the problem of too much space consumption and guarantee the accuracy in approximate neighbors,this paper proposes a new solution,namely locality sensitive hashing algorithm,based on grid.Algorithm in this paper absorbs the thought of grid division from random grid method and divides the data based on grid.This method will not change the spatial neighbor structure of the data.At the same time,it utilizes the advantage of locality sensitive hashing in dealing with high-dimensional data to search the grid and retrieve the approximate neighbor data via index.This method can effectively reduce the amount of data transmission.And since the method divides data into many grids,it greatly ensures the retrieval precision.We implement the algorithm in this paper under the framework of MapReduce.And the algorithm has good scalability.Experimental results show that the algorithm proposed in this paper has good performance when processing the enormous data in high-dimensional space.
Keywords/Search Tags:Big Data, MapReduce, Approximate Nearest Neighbors, Locality Sensitive Hashing, Grid Index
PDF Full Text Request
Related items