Font Size: a A A

Research On Controlled Semantic Embedding And Deep Mutual Information For Cross-modal Hashing

Posted on:2022-12-18Degree:MasterType:Thesis
Country:ChinaCandidate:R YangFull Text:PDF
GTID:2518306782952499Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the development of the Internet,a large amount of multimedia data(i.e.,pictures,texts,and videos)has been generated,and these data are usually stored in databases.How to find the semantically related information in these databases becomes a challenging task.The method to solve such challenging tasks is information retrieval.CrossModal Hashing,as one of the branches of information retrieval,has the characteristics of fast calculation speed and high storage efficiency.It is the most promising solution for information retrieval on large-scale multimedia data.This thesis starts from the three subtasks of crossmodal hashing,combining related techniques such as controlled semantic embedding and deep mutual information,conducts in-depth research and discussion on the semantic gap,learning to rank,and quantization.The main research contents are as follows:The first innovation of this thesis revolves around the study of the first subtask,the semantic gap.Most existing methods have emphasized directly mapping heterogeneous features into the common subspace,which inevitably results in highly entangled representations,thereby preventing them from bridging the modality gap.This thesis presents a novel deep framework,Learning Controlled Semantic Embedding for Cross-Modal Retrieval,which is the attempt to learn disentangled representations with the controlled semantic structure for cross-modal retrieval.The proposed method constructs conditional variational autoencoder and discriminator for each modality separately and enhances the variational autoencoder by using a discriminator-driven feedback mechanism.Benefiting from the feedback mechanism,the generative network can learn more interpretable semantic representations for different modalities,by utilizing the rich semantic information in the training samples and excluding information irrelevant to the retrieval task.Combined with the self-supervised semantic information provided by the label network,this model is able to learn disentangled representations with the controlled semantic structure for cross-modal retrieval before sample features are mapped to a common semantic space.The second innovation of this thesis focuses on how to design a reasonable ranking loss to optimize the learning to rank problems in the common subspace.By modeling the learning to rank subtask of cross-modal hashing as the problem of separating positive and negative sample distributions,this thesis presents a novel deep framework called Hashing with Deep Mutual Information Maximization.Based on a Mutual Information Neural Estimator,the proposed method has the ability to estimate and optimize the mutual information between the distance distributions of positive and negative samples generated during the ranking learning process.Combined with the self-supervised semantic information provided by the label network,the proposed mutual information objective function can be easily extended to multiple modalities and accurately describe the learning to rank problems in cross-modal hashing.The whole process does not require hyperparameters to carefully construct the ranking loss or heuristic sampling methods to avoid the hard sample problem.This thesis conducts comprehensive experiments on four benchmark datasets,MIRFLICKR-25 K,NUS-WIDE,MS-COCO,and IAPR-TC12.In the experiment,the performance of the proposed approaches and 9 comparison algorithms are evaluated with the hash code lengths of 16-bit,32-bit,and 64-bit,respectively.The first innovation gains performances improvement more than 1%,3%,and 9% in terms of average m AP on the three datasets.The second innovation gains performances improvement more than 2%,4%,14%,and 6% in terms of average m AP on the four datasets.The experimental results verify the superiority of the proposed methods over existing cross-modal hash retrieval algorithms.
Keywords/Search Tags:cross-modal hashing, semantic gap, learning to rank, semantic disentanglement, mutual information
PDF Full Text Request
Related items