Font Size: a A A

Research On Approximate Nearest Neighbor Search Based On Query-directed Graph

Posted on:2022-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:Z ZhaoFull Text:PDF
GTID:2518306497972519Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of information technology and the continuous increase of data processing,and how to quickly search in massive,high-dimensional datasets becomes more and more important.High-dimensional approximate nearest neighbor search(ANNS)has drawn much attention over decades since its applications in data mining,machine learning and artificial intelligence.For sparse discrete data(like documents),we can build advanced index structures(like inverted indexes)to perform effective nearest neighbor search.But for dense continuous vectors,there are currently four types of methods: tree-based methods,hash-based methods,quantizationbased methods,and graph-based methods.Among them,the graph-based method has attracted attention due to its high accuracy and low search complexity.Experimental results show that the performance of the graph-based method in the commonly used Euclidean space is much better than other popular algorithms.The reason is that the graph-based method can more accurately express the neighbor relationship than other methods.In the case of achieving the same search accuracy,the non-graph method needs to check more neighbor nodes,resulting in poor search performance.Many researchers have turned to graph-based research due to the outstanding performance of graphs in ANNS.While various graph-based methods use different construction strategies,the widely-accepted principle is to make graphs as sparse as possible to reduce the search cost.In this paper,we observed that sparse graph couldn't guarantee high accuracy(close to or equal to 100%),or incurs significant cost in the high-precision region.To this end,this article uses a novel edge selection strategy to construct a graph,and named it as the Query-Directed Dense Graph(QDG),and use the query-oriented method for efficient query on this graph,which greatly reduces the search cost.The construction and query process of QDG is described in detail as follows:(1)To construct QDG index,it is necessary to construct an initial k-nearest neighbor graph first.There are many methods to obtain this k nearest neighbor graph,such as NN-descent,KGraph or Faiss.On the initial k NN graph,this paper obtains the initial node by calculating the approximate center of the dataset(called the navigation node),and uses the greedy search method from the initial node to select the neighbor candidate set for each node on the graph.We designed a new edge selection strategy to ensure the creation of a denser graph by adjusting the minimum angle between node neighbors.According to the knowledge of graph theory,the connectivity of the graph is related to the average out-degree of the graph.Although the dense graph may increase the complexity of the search,it also increases the connectivity of the graph network.(2)In order to reduce the query complexity,this paper uses the classic K-means method,for each node on the graph,use the cosine distance to cluster its neighbor set,so that neighbors with similar angles are clustered into one category.Although the graph itself is dense,in the query process,we only check some neighbors that are close to the query point to reduce the search cost.(3)Due to the high storage cost of the cluster center itself,this paper uses a variant of Product Quantization(PQ)to store the cluster centers obtained by K-means in a coded form,and the reconstructed cluster center is obtained through the pre-calculation table to calculate the angle between the cluster center and the query point,which optimizes the space and time performance.Through a series of experiments on datasets(from real life),we found that the method proposed in this paper improves the query time by up to 2.2 times.And since this method is based on greedy search algorithm,it can also be transplanted to most graph-based search algorithms and shows good performance.
Keywords/Search Tags:ANNS, k-nearest neighbor graph, K-means clustering, PQ
PDF Full Text Request
Related items