Font Size: a A A

Research Of Local Sensitive Hash Index Based On Nearest Neighbor Graph

Posted on:2018-01-11Degree:MasterType:Thesis
Country:ChinaCandidate:L X WangFull Text:PDF
GTID:2348330518998564Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
In the information age,data always affect everyone's life.With the development of society and the progress of the times,the dimension of the data is becoming higher and higher,and the progress of technology makes the data collection becomes more and more easily,causes the database increasing in large scale,complexity is increasing,so how to find the most similar(nearest)one or more data to the target data quickly from the mass of high dimensional data is becoming a problem.For low dimensional data sets,we can easily solve the problem by linear computation,but for a large number of high-dimensional data sets,linear search requires a very large amount of computation and time-consuming.Therefore,in order to solve this problem,we introduce an index technology which we call the approximate nearest neighbor(ANN)search.In high dimensional space,approximate nearest neighbor search has become a basic problem in massive multimedia data retrieval.At present,local sensitive hashing(LSH)and its variants are considered to be the most effective solution to approximate nearest neighbor search.The basic idea of LSH is: two adjacent points in the original data space have a high probability of closing each other in the new data space through the same mapping or projection,and the probability of two points which are not adjacent mapping into the same bucket is very small.In this paper,we focus on the research and analysis of the sensitive hashing technique,the space curve and the nearest neighbor graph.In terms of locally sensitive hashing techniques,we conduct a study and research about multiple hash algorithms,analyze their advantages and disadvantages.The advantage of locality sensitive hashing is able to achieve rapid filtering of candidate points,by constructing a composite hash function filtering effectively the irrelevant points to get high quality candidate sets.But the drawback is that access to the candidate point dependent on a random large number of I/O,and there are a lot of false negatives.In order to obtain high quality returned results,we need to build multiple hash table,visiting the candidate point enough,which leads to the huge cost of time and space.The space curve can be used to establish the linear order relation of the compound hash value,so as to realize the fast location of the neighborhood.The nearest neighbor graph technique can quickly converge to the better candidate points with the help of the adjacency relation between the data points,but it is easy to fall into local optimum due to the influence of the initial search points.In order to overcome the shortcomings of the current LSH methods,this paper proposes a new algorithm--NNG-LSH,which uses the space curve and the nearest neighbor graph to achieve fast neighbor location and Local convergence of nearest neighbor search.First we establish a linear order(alphabet order)of compound hash value by means of the space curve,then the points in the original data set are sorted in the increasing order of their corresponding compound hash keys.Also,the points of similar hash value can be stored in consecutive disk pages,so as to reduce the random I/Os to access in the query process and the rapid positioning in the field of implementation.In the process of approximate nearest neighbor search,we introduce the nearest neighbor graph,complete convergence through the neighbor points of the page where the query point locate.Thus NNG-LSH achieves the local neighbor optimization,improve the accuracy of the return results.Extensive experiments conducted over four real-life high-dimensional data sets demonstrate the superiority of NNG-LSH over the state-of-the-art approaches,with respect to the accuracy of results and space requirement.
Keywords/Search Tags:High-dimensional Data, Approximate Nearest Neighbor, Locally Sensitive Hashing, Spatial Curve, Nearest Neighbor Graph
PDF Full Text Request
Related items