Font Size: a A A

Hashing Based Algorithms For Nearest Neighbor Search In High Dimensions

Posted on:2014-09-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y LinFull Text:PDF
GTID:2268330395489211Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Nearest neighbor search is a fundamental problem in various research fields like ma-chine learning, data mining, pattern recognition and computer vision. Recently, hashing-based approaches, e.g, Locality Sensitive Hashing, are proved to be effective for scalable high dimensional nearest neighbor search. Many hashing algorithms found their theoretic root in random projection. Since these algorithms generate the hash tables randomly, a large number of hash tables (i.e, long codewords) are required in order to achieve a satisfied per-formance. Another category of hashing methods is the learning based algorithms. They achieve satisfied performance in short codes, however, fail to make significant improve-ment as the length of the code increases.To address the limitations, we propose two novel hashing algorithms called Density Sensitive Hashing and Compressed Hashing in this thesis.(i)Density Sensitive Hashing can be regarded as an extension of Locality Sensitive Hashing. By exploring the geometric structure of the data, Density Sensitive Hashing avoids the purely random projections selec-tion and uses those projective functions which best agree with the distribution of the data, and then selects the final projection vectors according to the maximize entropy principle.(ii)Compressed Hashing explores the techniques of sparse coding and compressed sensing. We introduce a sparse coding scheme, based on the approximation theory of integral op-erator, that generates sparse representation for high dimensional vectors. We then project sparse codes into a low dimensional space by effectively exploring the Restricted Isometry Property, which is a key property in compressed sensing theory.Both of the theoretical analysis and the empirical study on data sets show that the proposed approaches scales well to large data sets and is significantly more effective than the state-of-the-art hashing algorithms for high dimensional nearest neighbor search.
Keywords/Search Tags:Nearest Neighbor, Hashing, Random Projection, Compressed Sensing
PDF Full Text Request
Related items