With the rapid development of computer technology and Internet,the world has begun to enter the era of big data.Big data brings abundant information to people and also challenges many traditional fields.Similarity query means querying similarity between query and given object,in which approximate nearest neighbor query is an important branch of similarity query.Approximate nearest neighbor query at the expense of a small amount of query accuracy in exchange for faster query speed,which have a wide range of applications in image retrieval and other fields.In the process of querying,due to the "dimensionality disaster",the query based on partitioning can not effectively solve the approximate querying problem of high dimensional data.Therefore,it is necessary to study the approximate nearest neighbor algorithm.Locality sensitive hashing has drawn more and more attention in the research of approximate nearest neighbor algorithm.Locality sensitive hashing hashs similar objects to the same hash bucket with high probability.In the process of querying,only the exact similarity of the data with the same hash is compared,so that most of the less similar data is filtered and the query result can be quickly obtained.In order to improve the recall of the algorithm,multiple hash tables are used in E2LSH.However,the data in the hash bucket of each hash table is affected by the original data distribution.data in the hash bucket is uneven.The imbalance data in the hash bucket results in a decrease of retrieval efficiency.In order to solve these problems in the research of approximate nearest neighbor algorithm,the main work of this paper is as follows:(1)We have researched a locality sensitive hashing called BILSH.Firstly,we use the sparse subspace clustering method to cluster the data.Then,we use E2LSH in each class after clustering,which makes the data distribution in the hash bucket more balanced and improves the retrieval efficiency.Moreover,a method of depth optimization is proposed.In the query process of BILSH,the steps of calculating the distance between high-dimensional data are optimized,which reduces the calculation of invalid distance and further speeds up the query speed in retrieval.(2)We have proposed a coding-based hash method to retrieve nearest-nearest neighbors from high-dimensional data LPIH.First of all,constructing the adjacency graph and minimizing the distance between the neighbors in the low-dimensional space makes the projected matrix maintain high-dimensional neighborhoods,which solves the problem that other hashing methods can not effectively preserve high-dimensional neighborhoods.Finally,entropy quantization is used to reduce the loss of the matrix after the projection in the quantization,Moreover,a hybrid distance measurement method is used in the quantization.(3)We have proposed an approximate nearest neighbor retrieval model with the high dimension data.Experiments of proposed algorithm in the public dataset Caltech and Cifar show that proposed algorithm have a better performance then original algorithm,which have verified that the proposed model and algorithm can effectively deal with the nearest neighbor retrieval problem. |