Font Size: a A A

Research On Approximate Nearest Neighbor Search And Maximum Inner Product Search For High-dimensional Dataset

Posted on:2022-10-18Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhaoFull Text:PDF
GTID:2518306572990839Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Nearest neighbor search is inherently computationally expensive in high-dimensional spaces due to the curse of dimensionality.As a well-known solution,locality-sensitive hashing(LSH)is able to answer approximate NN(ANN)queries in sublinear time with constant probability.However,existing coarse-grained structures fail to offer accurate distance estimation for candidate points,which translates into additional computational overhead when having to examine unnecessary points.This in turn reduces the performance of query processing.In contrast,we propose a fast and accurate in-memory LSH framework,called PM-LSH,that aims to compute the ANN query on large-scale,high-dimensional datasets.First,we adopt a simple yet effective PM-tree to index the data points.Second,we develop a tunable confidence interval to achieve accurate distance estimation and guarantee high result quality.Third,we propose an efficient algorithm on top of the PM-tree to improve the performance of computing ANN queries.In addition,we use the LSH method to solve another problem,called maximum inner product search(MIPS),in high-dimensional spaces.Since inner product is not a metric,which makes the LSH methods fail to deal with a MIPS problem directly,we first adopt propose a novel asymmetric transformation function to convert the MIPS problem into an ANNS problem under cosine space,which eliminate the issue in the exsiting methods.Then,we propose an efficient framework,called global multi-probe(GMP),to answer the converted ANNS problem.Finally,we develop a novel adaptive early termination(AET)for GMP,which adaptively determines search termination conditions for individual queries.To examine the performance of PM-LSH and GMP,we compared their efficiency and accuracy on totally 11 commonly used real data sets with competitors.Extensive experiments with real-world data offer evidence that PM-LSH and GMP are capable of outperforming existing proposals with respect to both efficiency and accuracy.
Keywords/Search Tags:Locality-Sensitive Hashing, Nearest Neighbor Search, Maximum Inner Product Search
PDF Full Text Request
Related items