Font Size: a A A

Approximate Nearest Neighbor Search For High-Dimensional Based On Nearest Neighbor Graph

Posted on:2019-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:C LiuFull Text:PDF
GTID:2428330572451618Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the advent of big data era,the efficiency of large-scale search becomes a research focus.Recently,an approximate nearest neighbor search method based on nearest-neighbor graph,which has high efficiency,accurate results and high neighbor rate in facing huge amounts of high-dimensional data search,enters the researchers' view.In this thesis,we first study approximate nearest neighbor algorithm based on nearest-neighbor graph,such as GNNS,k Graph etc.This algorithm constructs a neighbor graph in the offline stage.In the online stage,greedy algorithm is used to query on the nearest-neighbor graph which is the index structure of search.The greedy algorithm starts from a random point and continuously loads the neighbors of the current nearest neighbor on the nearest-neighbor graph.If the neighbor is closer to the query point than the current nearest neighbor,the current point is replaced.By continuously obtaining the local optimal solution,we can gradually approach the process of global optimal solution.However,due to the intrinsic deficiencies of greedy algorithms,as the number of iterations increases,the probability of the algorithm falling into a local optimum also increases significantly,which will reduce the accuracy and efficiency of approximate nearest neighbor search.For greedy algorithm,it is easy to fall into the local optimal defect in the nearest-neighbor graph.So we use the following strategies: In the online searching stage,a better starting point is provided for the greedy algorithm,which makes the search less immersed in local optimum and improves the accuracy and efficiency of nearest-neighbor search.A good starting point is to position the greedy selection near the query point in the nearest-neighbor graph.In this case,only a few iterations are required to access the querying results.In this thesis,a new ANN scheme based on the nearest-neighbor,NSH-NNG,is proposed.It combines the neighbor-sensitive hashing technique and the ANN search based on the nearest-neighbor graph.Neighbor-sensitive hashing is one of the hash learning methods,it takes distribution characteristics of the data set into consideration while generating hash function,and is more distinguishable for the points that is close to the search point.Thus it can maintain high accuracy while doing ANN search fast.In the preprocessing stage,the scheme constructs an NSH hash index and a nearest-neighbor graph;In the search stage,the query points are mapped into NSH hash codes.Firstly,a better starting point for the greedy algorithm is provided by neighbor-sensitive hashing hash index.This scheme reduces the probability that the greedy algorithm may fall into local optimum,and achieves the fast approximate nearest-neighbor search with high accuracy.In addition,while improving the search performance,this scheme also proposes a new fast neighbor graph construction method based on Data-Sensitive Hashing.Through machine learning,data-sensitive hashing,can evenly distribute similar data objects in the database into hash buckets.According to this characteristic,this thesis uses the divide-and-conquer method.Firstly,the entire database is evenly divided into several small subsets by DSH.In each subset,a linear search method is used to generate the sub-neighborhood graphs,and then the subgraphs are merged into the nearest neighbor graph.Finally,the NN-descent algorithm is used to optimize this nearest-neighbor graph.In this thesis,we analyze the complexity of NSH-NNG scheme theoretically,and our experimental results prove that it improves the performance of ANN search.At the same time,it proves the advantages of the fast neighbor graph construction method based on datasensitive hash in the graph construction algorithms.
Keywords/Search Tags:Nearest Neighbor Graph, Approximate Nearest Neighbor Search, Neighbor-Sensitive Hashing, Data-Sensitive Hashing
PDF Full Text Request
Related items