Font Size: a A A

Research On Optimal Parameter Selection Of Locality-sensitive Hash Algorithm

Posted on:2022-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:H F HuFull Text:PDF
GTID:2518306497472394Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the field of database retrieval,Nearest Neighbor search(NN)is a very basic retrieval problem,which has been one of the most popular research directions in the field of database.It has a wide range of applications in pattern recognition,computer vision,image retrieval and so on.In Euclidean space,NN is to find the data object which has the smallest distance with the query among all the data objects in the given dataset.In high-dimensional space,it is very expensive to find the nearest neighbor of query points directly.As an alternative,Approximate Nearest Neighbor search(ANN)has become one of the hot research directions in recent years.Locality-Sensitive Hashing(LSH)is a very effective algorithm to solve the problem of ANN.The algorithm can find the ANN of the query with a certain probability and has been widely used in practical system.The algorithm needs to select the appropriate hash function in advance,and sets its total number m and filtering threshold l.With a certain search strategy,the algorithm can find the approximate nearest neighbor for the query.Whether the m&l is appropriate or not is the key to the efficiency and accuracy of the algorithm.However,for a specific data set and approximate ratio c,the classical QALSH algorithm uses fixed m&l to solve different kNN problems of different queries,which will lead to additional query cost and reduce the accuracy.This paper proposes an algorithm based on deep learning technology,which can predict the appropriate m&l for different queries under specified parameters(approximate ratio c,the k in kNN problem,etc.).And the predicted m&l is used to find ANN for the query.The main contributions of this paper are as follows:(1)This paper proposes an algorithm based on deep learning technology and LSH.For the training of the constructed deep learning model,this paper adopts a two-stage training strategy.In the first stage of training,the deep learning model can take the data distribution of the dataset into account.In the second stage,the deep learning model can obtain the relationship between the queries points and c,k and other parameters in the kNN problem.Through the training,the model can make reasonable prediction and the appropriate m&l is obtained.(2)The classical QALSH algorithm can support the retrieval with approximate ratio of c>1,but can not support the search of c=1.This paper presents an algorithm QALSH_TC.The algorithm uses a new bucket expansion strategy and termination condition,which enables it to support the search of c?1 and can reduce the additional collision counting and achieve better query results.In this paper,QALSH TC is used to obtain experimental data of deep learning model and carry out collision detection after deep learning model prediction.(3)For each given data set,a new training set is established.In the process of constructing the training set,this paper enumerates the appropriate m&l in a reasonable range for a query with specified parameters.Then,QALSH TC algorithm is used to obtain the corresponding Recall and 10.According to the recall priority principle,the appropriate m&l is obtained for the query with these parameters.The query point,the specified parameters and the appropriate m&l constitute a record in the training set.(4)Through a large number of experiments,the effectiveness of the proposed algorithm based on deep learning technology and LSH is proved.
Keywords/Search Tags:LSH, Deep Learning, Nearest Neighbor Search, Parameter Optimization
PDF Full Text Request
Related items