Font Size: a A A

Efficient Approximate Nearest Neighbor Search Based On Navigating Spreading-out Graph

Posted on:2020-04-21Degree:MasterType:Thesis
Country:ChinaCandidate:C X WangFull Text:PDF
GTID:2428330575459734Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Approximate Nearest Neighbor Search(ANNS)is a fundamental problem in database,data mining and artificial intelligence.A practical and scalable ANNS algorithm should be both fast and memory efficient.Compared to the traditional tree-based and hashing methods,the graph-based approaches have attracted much attention from both academics and industry due to its high accuracy and performance.Although many early graph-based approaches can guarantee a low search time complexity theoretically,they all suffer from the problem of high indexing time complexity.which makes them impractical in many big data scenarios.Recently,some graph-based methods have been proposed to reduce indexing complexity by approximating the traditional graphs and have achieve d revolutionary performancebut they are still not efficient enough to be practical in many industrial scenarios.This thesis conduct a more in-deep research of graph structure based on the current research to tackle the problems and difficulties above.More specifically,the author take the following four aspects into consideration:(1)ensuring the connectivity of the graph:(2)lowering the average out-degree of the graph for fast traversal;(3)shortening the search path;(4)reducing the index size.Based on the four aspects above,the author propose a novel graph structure called Monotonic Relative Neighborhood Graph(MRNG).which can guarantee a very low search complexity(close to logarithmic time complexity).However,due to the high construction time complexity,it is impractical to construct MRNG on large-scale datasets directly.So instead,the author proposed another novel graph structure called Navigating Spreading-out Graph(NSG)which is meant to approximate the MRNG.Thus the author have a novel algorithm for ANNS with both efficient incdexiig and search performance.Extensive experiments on widely used large-scale datasets are conducted on the proposed NSG algorithm.The experiment results demonstrate that NSG outperforms all the existing algorithms significantly.What's more,the author propose a prototypal design of NSG-based distributed large-scale fast approximate nearest neighbor search system for industrial scenarios,and verify the feasibility of the proposed system on a 100-million-scale dataset.
Keywords/Search Tags:Information Retrieval, Nearest Neighbor Search, Proximity Graph
PDF Full Text Request
Related items