Stacked Hashing Quantization Algorithm For Nearest Neighbor Search

Posted on:2019-03-24

Degree:Master

Type:Thesis

Country:China

Candidate:J Shi

Full Text:PDF

GTID:2348330542489054

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of Internet,various types of data have exploded.Traditional information processing technologies are facing new challenges.How to retrieve target data in high-dimensional data efficiently is one of the hot issues in the computer field.Approximate nearest neighbor search is a solution to this problem.Its main idea is to propose a new approximate distance measure to retrieve the nearest data object.At present,various nearest neighbor algorithms have been proposed.Product Quantization is one of the effective methods,which has low memory consumption and high query efficiency.However,it needs to establish a lookup table of quantization centers,which makes the time complexity high.K-means Hashing Quantization is proposed later,which directly quantizes the vectors into binary codes and tries to keep the spatial neighbor structure.It saves storage and uptime because the calculation of Hamming distance is faster than Euclidean distance.However,it is to optimize a high-dimensional hypercube in the original space iteratively.If the cube dimension is high,the speed is slow and the memory consumption is relatively large.This paper presents a new quantization algorithm named Stacked Hashing Quantization.We use low-dimensional cubes to approximate the original data gradually.Firstly,a high-dimensional training set is divided into low-dimensional training sets by Product Quantization.Secondly,K-means Hashing Quantization is performed on every low-dimensional subspace.Thirdly,we calculate the remaining vector and use it to perform codebook training.Repeat the third step until a given error is reached or the number of layers is specified.Then use the hierarchical codebooks to encode the data of the database to get a multi-layer hash codes.In the online query phase,the query vectors are encoded by the hierarchical codebook set and then the query vectors and the vectors in the database are matched by hamming distance.In this paper,we design the experiment on the constructed SIFT 17 and classic SIFT1M datasets.Compared with classical quantification methods,this algorithm has advantages in terms of recall,precision,MAP,etc.

Keywords/Search Tags:

Nearest Neighbor Search, High-dimensional Vectors, K-means, K-means Hashing

PDF Full Text Request

Related items

1	Research On Hashing Accelerated Approximate Nearest-Neighbor Search
2	Hashing-based Nearest Neighbor Search
3	Hash-based Approximate Nearest Neighbor Search For High-dimensional Data
4	Optimized K-Means Hashing Quantization For Approximate Nearest Neighbor Search
5	Approximate Nearest Neighbor Search For High-Dimensional Based On Nearest Neighbor Graph
6	Locality Sensitive Hashing Index Based On Neighborhood Collision Counting
7	Research On Approximate Nearest Neighbor Search And Maximum Inner Product Search For High-dimensional Dataset
8	Hashing Based Algorithms For Nearest Neighbor Search In High Dimensions
9	Approximate Nearest Neighbor Search Based On Based On Hashing And Clustering For High Dimensional Data
10	Research On The High-Efficient K-Nearest Neighbor Algorithm And Its Parallelization Of MPI