Font Size: a A A

Hashing-based Nearest Neighbor Search

Posted on:2016-06-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:J F WangFull Text:PDF
GTID:1228330470458007Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Recently, hashing-based algorithms have attracted lots of attention to deal with the nearest neighbor search, which is a fundamentally important problem in computer vi-sion and machine learning. The basic idea of hashing is to represent high-dimensional data by compact binary codes, and the similarity between the data are approximated by that between the codes. Due to the low memory cost and fast approximate distance computation, the hashing-based algorithm is promising to apply in the large scale ap-plications.Despite the advantage with regards to the memory and computational efficiency, there may be an approximation error by using the similarity between the codes rather than the original similarity. The error issue is currently not well solved, and thus this dissertation focuses on improving accuracy from various aspects.Generally, there are two phases for hashing-based nearest neighbor search. The first is hashing, which maps the data into the binary codes, and the second is to evaluate the similarity based on the codes for binary code ranking. To improve the accuracy of hashing-based nearest neighbor search, we study how to achieve better binary codes and effective approximate distance.Novel algorithms in this dissertation are as follows.1. We propose the order preserving hashing for the Hamming-based approach, and the key idea is to learn hash functions by maximizing the alignment between the similarity orders computed from the original space and the ones from the Ham-ming space. The problem of mapping the points into different hash codes is taken as a classification problem in which the points are categorized into several groups according to the Hamming distances to the query. The hash functions are opti-mized from the classifiers pooled over the training points. The key novelty lies in the order preserving which is a critical factor for the nearest neighbor search. Extensive experiments on standard datasets have shown superior accuracy for nearest neighbor search over existing Hamming-based hashing algorithms with the same response time.2. We propose the Optimized Cartesian K-Means for the quantization-based map-ping to encode the data points with a low distortion error and a high ranking accuracy. In the traditional approaches, the codebook is the Cartesian product of multiple sub codebooks, while in our proposed approach, multiple such code-books are used and the datum is encoded as the aggregation of the codewords, each of which comes from the corresponding codebook. These codebooks are optimally learned with regards to the minimization of the distortion errors. This simple strategy can provide low distortion errors than traditional methods theo-retically and practically, and improves the accuracy of nearest neighbor search from extensive experiments.3. We propose a practical solution to optimize the distances for binary code to im-prove the ranking accuracy. Specifically, we employ a non-parametric distance lookup table to store the distance between the query and any binary code due to the finiteness of the codes. To address the potential memory issue, we split the codes into multiple sub codes, each of which generates a query-dependent small distance lookup table. Then the approximate distance is constructed as the sum-mation of the distances from all sub codes by looking up their respective tables. The entries of the tables are optimized by minimizing the misalignment between the approximate distance and the original distance. Such a scheme is applied to both the symmetric distance between the binary codes and the asymmetric dis-tance between the query and the binary code. Due to the precise approximate distance guaranteed in theory, extensive experiments demonstrate a high ranking accuracy by distance optimization.In a nutshell, we propose three novel algorithms to improve the accuracy of hashing-based nearest neighbor search from the perspective of binary code encoding and binary code ranking. Theoretically and experimentally, the proposed approaches show supe-rior and promising accuracy over the existing state-of-the-arts.
Keywords/Search Tags:Nearest Neighbor Search, Hashing Functions, Order Preserving Hashing, Optimized Cartesian K Means, Optimized Distances
PDF Full Text Request
Related items