Font Size: a A A

Optimized K-Means Hashing Quantization For Approximate Nearest Neighbor Search

Posted on:2020-06-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y L ChenFull Text:PDF
GTID:2428330602958015Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,the Internet is generating massive high-dimensional data,how to retrieve the target data quickly and accurately is an important issue in the computer field.This requires people to store and query the data index efficiently.ANN has become an effective technology of solving this problem in various fields.The ANN search techniques for high-dimensional is mainly divided into two categories,one is based on prototypes,such as product quantization(PQ),additive quantization(AQ),etc.The other is based on projection hashing,such as local sensitive hash(LSH),Iterative quantization(ITQ),etc.Both two types of methods are developing rapidly,and each of them has its own advantages in different application scenarios.At the same code length,the prototypes methods is usually more accurate than the projection hashing methods,but the distance calculation is slower.In recent years,prototypes based binary hashing methods have become research focus,such as k-means hash(KMH),adaptive binary quantization(ABQ),etc.These methods have high precision due to the idea of prototypes,and their neighbor search is very efficient as Hamming distance calculation and sorting.Besides,they are more advantageous when processing high-dimensional data.However,this type of method is limited by the binary representation of the hypercube characteristics.And the quantization result can not match with the real data distribution with a large quantization error.This paper delved into the typical method of binary hashing:K-means hash(KMH),and proposed Optimized K-means Hashing(OKMH)model aiming at the shortcomings of KMH with large quantization error.Furthermore,the adaptive coding technique is added to the above optimization model to obtain a more refined binary code subset and codebook,this paper proposed Refined Optimized K-means Hashing(ROKMH),which further reduces the quantization error of OKMH.Finally,the proposed OKMH and ROKMH were evaluated on four public datasets(SIFT,CIFAR-320,CIFAR-512 and CALTECH256),and compared with several state-of-the-art binary algorithms such as LSH,KMH,ABQ,etc.In the experiment,the evaluating indicators(quantization error,recall rate,accuracy rate and MAP value)have shown that the proposed methods outperform many binary hashing methods by about 10%.
Keywords/Search Tags:ANN, Massive high-dimensional, KMH Quantization, Optimized Model
PDF Full Text Request
Related items