Font Size: a A A

Research And Application Of Approximate Nearest Neighbor Algorithm Based On Navigable Small World

Posted on:2021-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:C LiFull Text:PDF
GTID:2428330605482447Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the era of big data,search efficiency is of great significance in the fields of image retrieval,machine learning,recommendation systems,and semantic document retrieval.In recent years,the approximate nearest neighbor search(ANNS)methods based on proximity graphs have higher query efficiency than the methods based on tree,hash,and quantization,and these methods have attracted special attention.Nearest neighbor search methods based on proximity graphs include Navigable Small World(NSW),Hierarchical Navigable Small World(HNSW),NSG,etc.,where HNSW with long distance edge scaling and the hierarchical structure shows outstanding performance and has become a major research and comparison method in this field.However,HNSW has the following shortcomings:(1)It is hard to achieve distributed deployment,and it is difficult to practically apply it to large-scale dataset due to the hardware resources limitation;(2)The greedy algorithm used by HNSW has a problem of falling into local optimum;(3)It does not support dynamics multi-attribute filtering search;(4)Due to the multi-layer graph structure and the connected edge strategy,there is a problem of large memory overhead during searching.Based on the above-mentioned shortcomings of HNSW,contributions of this paper are as follows:(1)We proposes a hierarchical navigable small world graph method based on subgraph partitioning,called GP-HNSW,which can support distributed storage and query.The dataset is divided into multiple subsets by the clustering method,and each subset uses the HNSW graph structure to organize the data and can be stored independently;(2)A multi-attribute method MA-NSW based on subgraph partitioning is proposed to solve the the problem of dynamic multi-attribute filtering when searching.MA-NSW builds an index by combining a navigation tree and multiple overlays,and directs the nearest neighbor search for specific attribute filtering to the corresponding overlay;(3)Aiming to solve the problem of excessive memory overhead of MA-NSW,a quantization coding optimization method is proposed,called SQMA-NSW.Experiments show that it has a good compression effect.The above research results have experimentally verified that the query speed and recall rate are superior to HNSW.At the same time,it also has the advantages of easy parallel maintenance of adding or deleting nodes.Finally,based on the above research results,a platform for large-scale scientific document semantic searching is designed and implemented.The platform can quickly search related documents based on multi-attribute filtering and text content.The research in this paper will provide a good reference for the research and application of approximate nearest neighbor search methods.
Keywords/Search Tags:approximate nearest neighbor search, hierarchical navigable small world graph, multiple attributes, scalar quantization
PDF Full Text Request
Related items