Font Size: a A A

Research On Hashing Methods Of Approximate Nearest Neighbor Searching On Big Data

Posted on:2022-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:J Y QinFull Text:PDF
GTID:2518306539462964Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the advent of the Internet,multimedia data has explosively grown in both quantity and dimensionality.How to capture the correlations among data and find similar information from a collection of massive data is a challenging issue.Hashing,which aims to seek approximate neighbor data points rather than exact similar data points,has increasingly become a popular technology for similarity searching,as well as a fundamental paradigm for large-scale data retrieval.Hashing methods can be roughly divided into two categories,including unimodal hashing and cross-modal hashing.Specifically,unimodal hashing aims to conduct similarity measuring among single modal(such as images)to seek data points close to query data points.By contrast,cross-modal hashing aims to explore the inherent heterogeneous similarities among data of different types,so that it can use one modality data point(e.g.,image)to search similar data points in other modalities(e.g.,text).Cross-modal hashing can provide users with hybrid searching results.This thesis focuses on the studies of above two types of hashing methods.First,we introduce the concept and theoretical basis of hashing,including similarity measure,loss function,discrete learning,out-of-sample learning.Then,we propose several effective and efficient hashing methods for large-scale retrieval.The main work of this thesis are as follows:To deal with the difficulty of adaptively discriminative feature learning,we propose an unsupervised graph-based hashing method with adaptively sparse learning for unimodal similarity searching.Specifically,we first exploit k-means clustering to obtain a graph Laplacia model that describes the pairwise similarity among data.Meanwhile,we enhance the sparsity of graph learning model by using a re-weighted1-norm,so that it can adaptively select sparse and discriminative manifold features for similarity measuring.Then,a linear regression model is used to quantify the real-valued feature representation into a compact binary representation.Experimental results demonstrate that our proposed method is beneficial to feature learning.To boost the discriminative power of subspace learning,we propose a supervised discrete semantic matrix factorization hashing method for cross-modal similarity search.We first embed the label information into collective matrix factorization to learn a semantic common subspace for heterogeneous data.Then,we simultaneously learn the compact hash codes and corresponding hash functions by deriving the matrix factorization into a discrete optimization.Moreover,an alternatively iterative optimization approach based on bit-by-bit learning is introduced to tackle the discrete solution efficiently.Compared to most cross-modal hashing methods,our proposed method can achieve a higher retrieval accuracy on several public datasets.To overcome the difficulty of both heterogenous semantics learning and discrete optimization,we propose a supervised scalable discriminative discrete hashing method for cross-modal similarity searching.First,we utilize orthogonal and balanced collective matrix factorization to a shared binary representation for heterogeneous data.As a result,the learned binary descriptors can be discriminative by fully excavating local manifold structure and preserving heterogeneous similarity.Then,we embed the semantics of labels into hash codes via a linear regression model to capture both intra-class similarity and inter-class difference of heterogeneous data.Unlike conventional hashing methods using bitwise discrete optimization,an efficient optimization procedure is introduced to obtain the whole bits of hash codes in one step,which greatly reduces the time complexity of discrete optimization.Moreover,theoretical analyses and experimental results clearly demonstrate that the proposed method is powerful for both heterogeneous feature learning and discrete optimization.Most existing subspace-learning-based cross-modal hashing methods usually neglect the private characteristics of each modality.To address this,we propose a new supervised cross-modal hashing method with joint specifics and consistency learning.Specifically,we decompose each modality data into a specific latent representation via an orthogonal and balanced matrix factorization for intra-modality similarity preservation.Then,an asymmetric distance-distance difference minimization model is introduced to supervise inter-modality similarity preservation for different specific representations.After that,we jointly embed the specifics and consistency into a common hash code by combining label information with multiple specific representations.Moreover,an alternatively iterative optimization procedure is introduced to solve the specifics and consistency learning with few space and computational complexities.Extensive experimental results clearly demonstrate that the proposed method is superior to state-of-the-art cross-modal hashing methods in terms of retrieval accuracy and computational efficiency.In summary,we systematically study two types of hashing methods,including unimodal and cross-modal hashing,and carefully design a series of hashing methods based on their intrinsic characteristics.Comparative experimental results on several widely used benchmark datasets validate the effectiveness and efficiency of our proposed hashing methods.
Keywords/Search Tags:learning to hash, unimodal retrieval, cross-modal retrieval, similarity search, discrete optimization
PDF Full Text Request
Related items