Research On Large Scale Multimedia Data Retrieval Based On Hashing Algorithm

Posted on:2016-09-29

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Z M Jin

Full Text:PDF

GTID:1108330470967834

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Nearest neighbor search is a fundamental problem in machine learning community, which has been applied in various fields, including information retrieval, data mining and computer vision. For example, nearest neighbor search can be used in image search, face recognition and k-means clustering. Recently, large scale multimedia data has appeared with the growth of internet. Generally, the features extracted from these datum are high dimensional and dense. Thus, how to conduct nearest neighbor search efficiently in the large scale multimedia dataset is considered as a very important research topic in the academia and industry. Currently, nearest neighbor search has two main directions, tree based method and hashing based method. Many tree based methods partition the feature space in the manner of tree, its time complexity and space complexity exponentially grow with the increase of dimension d. In the contrary, hashing algorithm does not have this limitation. It generates hash functions (preserving similarity, i.e. if two points are similar in the original feature space, then they have small hamming distance in the hamming space, and vice versa.) and utilizes them to compress a d dimensional feature into a c-bits (generally, c<<d) binary code, then performs search by using hamming ranking or hash table. It is obvious that one can use hardware (XOR) to accelerate the procedure of hamming ranking, and the time complexity of hash table based search is O(1) (is not related with d). Thanks to the advantages of high computational efficiency and insensitivity of dimension, hashing algorithm attracts interest from many researchers. This thesis is mainly about the problem of hashing based nearest neighbor search for large scale multimedia data.(1) Efficient search based hash function learning:Hashing based nearest neighbor search method is fundamentally based on hash function. A good hash function aims at obtaining high precision with a short code. Random projection based hashing algorithm does not consider the data distribution and have to generate long code to produce a satisfactory result. And learning based hashing algorithm learns projections from the data distribution and can achieve better performance with shorter code. We study the metrics of a hash function, and propose the Complementary Projection Hashing (CPH) algorithm, which preserves the nearest neighbor property (for obtaining high precision) and considers the hash bucket balance (for obtaining high speed) simultaneously. The proposed algorithm lays the foundation of hashing based high efficient search.(2) Cross-media data based hash function learning:It is important to develop the methodology of cross-media data based hash function learning in the era of large scale multi-media data (such as text, image, video, audio). It requires that the relevance between different kinds of media datum can be expressed within hash coding (i.e. the media datum with same concept must have same code). To this end, we propose the Iterative Multi-View Hashing (IMVH) algorithm considering both within-modal similarity preservation and cross-modal similarity preservation. The similarity preservation not only makes the similar points have similar codes, but also the dissimilar points have dissimilar codes (distinctiveness). The proposed method provides a general framework for hashing based cross-media efficient search.(3) Optimal hash table based fast nearest neighbor search:Currently, the performance of hash table based nearest neighbor search method is still not better than KD-tree based.method. The main reason is that the existing hashing algorithms have to use a large hamming radius for obtaining high precision and high recall, which enormously increases the search time. To address this limitation, we propose the Iterative Expanding Hashing (IEH) algorithm, which uses an auxiliary index to expand the points obtained from a small hamming radius, and finally achieves high precision, high recall and low search time. This method can essentially improve the performance of hash table based method, and provide a strong guarantee for online real-time large scale information retrieval.

Keywords/Search Tags:

Big Data, Nearest Neighbor Search, Hashing Algorithm

PDF Full Text Request

Related items

1	Approximate Nearest Neighbor Search For High-Dimensional Based On Nearest Neighbor Graph
2	Hash-based Approximate Nearest Neighbor Search For High-dimensional Data
3	Hashing-based Nearest Neighbor Search
4	Stacked Hashing Quantization Algorithm For Nearest Neighbor Search
5	Hashing Based Algorithms For Nearest Neighbor Search In High Dimensions
6	Locality Sensitive Hashing Index Based On Neighborhood Collision Counting
7	Research On Large Scale Multimedia Data Retrieval Based On Hashing Algorithm
8	Online Matrix Factorization Hashing For Nearest Neighbor Search
9	Hashing Algorithms For Multimedia Nearest Neighbor Search
10	Study On Approximate Nearest Neighbor Search Over Encrypted Data In Cloud Computing