Font Size: a A A

Learning To Hash For Large-scale Cross-modal Retrieval

Posted on:2022-04-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y X WangFull Text:PDF
GTID:1488306311967349Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the growing maturity of mobile Internet technology,multi-media data has exploded.In order to explore multimedia data more comprehensively and help users get valuable information from the massive and cluttered data,the demand for large-scale cross-modal retrieval is increasing.Compared with traditional uni-modal retrieval,cross-modal retrieval can largely improve user experience and better fit real-world application scenarios.However,multimedia data are characterized by large scale,complex structure,high dimensionality,as well as the problem of heterogeneous gap and semantic gap between different modalities,which bring great challenges to large-scale cross-modal retrieval.Learning to hash,as a typical approximate nearest neighbor search technique,has attracted more and more attention due to its low storage consump-tion and efficient search.Although some progress has been made in cross-modal hashing research,there are still many problems that need to be solved.1)How to achieve effi-cient discrete optimization of binary hash codes for different modalities.Some methods use a relaxation strategy when solving the discrete constraints of the hash code,lead-ing to large quantization errors and low-quality hash codes,which cannot well solve the heterogeneous gap problem of cross-modal retrieval.And some other discrete optimiza-tion algorithms use complex gradient descent or bit-by-bit optimization strategies with very low learning efficiency.2)How to fully exploit the large amount of information contained in the heterogeneous multimedia data.In the utilization of data information,some methods only consider the global information of data,but ignore the local similar-ity hidden in the data distribution,which makes the retrieval results not fine enough.3)How to achieve fine similarity preserving of hash codes.Existing hashing methods usu-ally embed a binary similarity into hash codes,which loses a large amount of semantic and multimodal feature information,and suffers from the squared complexity problem.In addition,the expressiveness of general binary codes is limited by the length of hash codes when solving the similarity preserving problem,making it cannot adequately fit the fine similarity information of data.4)How to achieve efficient online multimedia stream learning.In more and more application scenarios,multi-modal data are usually collected in the form of data streams.Traditional cross-modal hashing methods learn hash codes and hash functions of different modalities based on batch processing,and the learning efficiency is very inefficient and cannot be well adapted to online cross-modal retrieval tasksThis thesis conducts an in-depth study on the learning to hash for large-scale cross-modal retrieval,and designs four supervised cross-modal hashing methods for the above problems.The main contributions of this thesis are summarized as follows.(1)A scalable asymmetric discrete cross-modal hashing method is proposed for the problem of large quantization errors and unscalable to large-sacle data of existing cross-modal hashing methods.It applies a distance-distance difference minimization to embed the supervised information of multi-modal data into hash codes,avoiding the use of the binary similarity matrix,thus reducing the time and space cost and making the model scalable to large-scale multimedia datasets.The semantic label,which is the most consistent information among all modalities,is treated as a special modality,and a collective matrix factorimzation technique is applied to learn its common latent subspace with different modalities,and hash codes is connected to the subspace by an asymmetric strategy to transfer more information.An efficient asymmetric discrete optimization algorithm is also proposed to solve the binary constraints of hash codes,which avoids the quantization error problem and ensures the quality of hash codes.(2)To more fully exploit the information in multimedia data,a fast cross-modal hashing method with global and local similarity embedding is proposed.It not only considers the global similarity information of heterogeneous data,but also mining the local similarity information within the group of data,which can make the retrieval re-sults more refined visually.In order to better utilize the supervised information,a simi-larity embedding framework containing pairwise similarity preserving and related label reconstructing is designed to maintain supervised information from two perspectives,which can lead to more discriminative hash codes.An efficient discrete optimization algorithm is also proposed,and a well group updating scheme is designed so that its computational complexity is linear with respect to the size of the training set,and its scalability to large-scale multimedia data is greatly improved.(3)A high-dimensional sparse cross-modal hashing method is proposed for the fine similarity preserving problem of hash codes.A fine-grained similarity is theoretically analyzed and designed to consider not only the high-level semantic similarity of the data,but also the low-level multi-modal feature similarity in a reasonable way.Due to the weak expressiveness of general hash codes,which cannot fit the fine-grained similar-ity well,the strong high-dimensional sparse coding is utilized to embed the fine-grained similarity into to-be-learnt hash codes.An efficient discrete optimization algorithm is also designed to solve the binary and sparse constraints of hash codes,which reduces the quantization error.Most importantly,the search complexity of the model is as ef-ficient as that of the general hash models.Extensive experiments on three widely used datasets show that the proposed high-dimensional sparse cross-modal hashing model is very effective and efficient.(4)A label embedding online cross-modal hashing method is proposed for online cross-modal retrieval scenarios.A label embedding framework is designed to exploit the supervised information of the data,which can generate highly discriminative hash codes and reduce the computational complexity.Through the inner product fitting of a block similarity matrix,the pairwise similarity of newly arrived data is maintained,and the connection between newly arrived data and existing data is also established so that the sensitivity of the model to newly arrived data is reduced,and thus efficient hash codes can be obtained.In addition,a discrete optimization algorithm is designed to solve the binary constraint problem of hash codes without relaxation,which can reduce the quan-tization error,and its computational complexity is only linearly related to the size of new arrivals,which is very efficient and scalable for large-scale multimedia datasets.Exten-sive experimental results on three benchmark datasets show that the proposed model outperforms some state-of-the-art offline and online cross-modal hashing methods in terms of accuracy and efficiency.
Keywords/Search Tags:Cross-Modal Retrieval, Learning to Hash, Discrete Optimization, Similarity Preserving, Scalable Hashing
PDF Full Text Request
Related items