Font Size: a A A

Stacked Hashing Quantization Algorithm For Nearest Neighbor Search

Posted on:2019-03-24Degree:MasterType:Thesis
Country:ChinaCandidate:J ShiFull Text:PDF
GTID:2348330542489054Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet,various types of data have exploded.Traditional information processing technologies are facing new challenges.How to retrieve target data in high-dimensional data efficiently is one of the hot issues in the computer field.Approximate nearest neighbor search is a solution to this problem.Its main idea is to propose a new approximate distance measure to retrieve the nearest data object.At present,various nearest neighbor algorithms have been proposed.Product Quantization is one of the effective methods,which has low memory consumption and high query efficiency.However,it needs to establish a lookup table of quantization centers,which makes the time complexity high.K-means Hashing Quantization is proposed later,which directly quantizes the vectors into binary codes and tries to keep the spatial neighbor structure.It saves storage and uptime because the calculation of Hamming distance is faster than Euclidean distance.However,it is to optimize a high-dimensional hypercube in the original space iteratively.If the cube dimension is high,the speed is slow and the memory consumption is relatively large.This paper presents a new quantization algorithm named Stacked Hashing Quantization.We use low-dimensional cubes to approximate the original data gradually.Firstly,a high-dimensional training set is divided into low-dimensional training sets by Product Quantization.Secondly,K-means Hashing Quantization is performed on every low-dimensional subspace.Thirdly,we calculate the remaining vector and use it to perform codebook training.Repeat the third step until a given error is reached or the number of layers is specified.Then use the hierarchical codebooks to encode the data of the database to get a multi-layer hash codes.In the online query phase,the query vectors are encoded by the hierarchical codebook set and then the query vectors and the vectors in the database are matched by hamming distance.In this paper,we design the experiment on the constructed SIFT 17 and classic SIFT1M datasets.Compared with classical quantification methods,this algorithm has advantages in terms of recall,precision,MAP,etc.
Keywords/Search Tags:Nearest Neighbor Search, High-dimensional Vectors, K-means, K-means Hashing
PDF Full Text Request
Related items