Font Size: a A A

Hashing For Large Scale Similarity Search

Posted on:2016-09-02Degree:MasterType:Thesis
Country:ChinaCandidate:K ZhaoFull Text:PDF
GTID:2308330476453322Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The explosive growth of the multimedia data (including texts, images and videos) on the internet has posed a great challenge to many applications in terms of fast simi-larity search. Content Based Image Retrieval is one representative example. To handle this problem, Hashing based approximate nearest neighbor search techniques have re-cently become more and more popular because of their improvements in computational speed and storage reduction.This paper focuses on the Hashing algorithms for large scale similarity search. It begins with optimizing and improving the traditional Spectral Hashing. Then it propos-es new Hashing model to deal with the problems that traditional Hashing framework may encounter. Finally, it shows a new method to model the semi-supervised Hashing problem to improve the retrieval accuracy. The contributions and innovation of this paper include:(1) Local Linear Spectral Hashing. It improves the Spectral Hashing from the fol-lowing perspectives:(1) Spectral Hashing only takes neighbor relations into con-sideration. The proposed method considers not only neighbor relations, but also non-neighbor relations; (2) Spectral Hashing needs to compute an x n matrix, which is very time-consuming when n is very large. The proposed method uses a. m×m (m<<n) local matrix instead; (3) Spectral Hashing assumes that data distributes uniformly, and it uses a little complicated solution. In most cases, the assumption of uniform data distribution is usually not true. The proposed method ignores this assumption, and adopts a simple linear model to solve the proposed problem. Experiments demonstrate the proposed methods are simple and efficient.(2) Locality Preserving Hashing. In general, Hashing methods will take two step- s:projection+quantization. In projection stage, several projected dimensions are generated in certain ways. Then in quantization stage, the projected values will be quantized into binary codes. These methods may destroy the locality structure of dataset that is preserved in projection stage, since them usually di-rectly thresholding the projected values to get the binary codes. By contrast, the proposed method combines projection and quantization stage, by using a joint optimization framework to complete projection and quantization simultaneous-ly. Consequently, the locality structure of the data in the feature space can be well preserved. The experimental results show the proposed locality preserved strategy is more reasonable.(3) Locality Preserving Discriminative Hashing. Hashing algorithms can be divided into three categories:supervised, semi-supervised and unsupervised. The semi-supervised methods have a promising performance since it integrates labeled and unlabeled data. One representative approach is Semi-supervised Hashing. How-ever, it only considers the pair-wise relations of labeled data, ignoring the global labeled information. What’s worse, it fails to preserve the neighbor relationship. Differently, the proposed method combines Linear Discriminant Analysis and Locality Preserving Projection to preserve the local and global structures of data simultaneously. The experiments on three standard datasets show the effective-ness and efficiency of the proposed method.
Keywords/Search Tags:Similarity Search, Image Retrieval, Linear Projection, Locality Structure, Hashing Algorithms
PDF Full Text Request
Related items