Research On Controlled Semantic Embedding And Deep Mutual Information For Cross-modal Hashing

Posted on:2022-12-18

Degree:Master

Type:Thesis

Country:China

Candidate:R Yang

Full Text:PDF

GTID:2518306782952499

Subject:Automation Technology

Abstract/Summary:

PDF Full Text Request

In recent years,with the development of the Internet,a large amount of multimedia data(i.e.,pictures,texts,and videos)has been generated,and these data are usually stored in databases.How to find the semantically related information in these databases becomes a challenging task.The method to solve such challenging tasks is information retrieval.CrossModal Hashing,as one of the branches of information retrieval,has the characteristics of fast calculation speed and high storage efficiency.It is the most promising solution for information retrieval on large-scale multimedia data.This thesis starts from the three subtasks of crossmodal hashing,combining related techniques such as controlled semantic embedding and deep mutual information,conducts in-depth research and discussion on the semantic gap,learning to rank,and quantization.The main research contents are as follows:The first innovation of this thesis revolves around the study of the first subtask,the semantic gap.Most existing methods have emphasized directly mapping heterogeneous features into the common subspace,which inevitably results in highly entangled representations,thereby preventing them from bridging the modality gap.This thesis presents a novel deep framework,Learning Controlled Semantic Embedding for Cross-Modal Retrieval,which is the attempt to learn disentangled representations with the controlled semantic structure for cross-modal retrieval.The proposed method constructs conditional variational autoencoder and discriminator for each modality separately and enhances the variational autoencoder by using a discriminator-driven feedback mechanism.Benefiting from the feedback mechanism,the generative network can learn more interpretable semantic representations for different modalities,by utilizing the rich semantic information in the training samples and excluding information irrelevant to the retrieval task.Combined with the self-supervised semantic information provided by the label network,this model is able to learn disentangled representations with the controlled semantic structure for cross-modal retrieval before sample features are mapped to a common semantic space.The second innovation of this thesis focuses on how to design a reasonable ranking loss to optimize the learning to rank problems in the common subspace.By modeling the learning to rank subtask of cross-modal hashing as the problem of separating positive and negative sample distributions,this thesis presents a novel deep framework called Hashing with Deep Mutual Information Maximization.Based on a Mutual Information Neural Estimator,the proposed method has the ability to estimate and optimize the mutual information between the distance distributions of positive and negative samples generated during the ranking learning process.Combined with the self-supervised semantic information provided by the label network,the proposed mutual information objective function can be easily extended to multiple modalities and accurately describe the learning to rank problems in cross-modal hashing.The whole process does not require hyperparameters to carefully construct the ranking loss or heuristic sampling methods to avoid the hard sample problem.This thesis conducts comprehensive experiments on four benchmark datasets,MIRFLICKR-25 K,NUS-WIDE,MS-COCO,and IAPR-TC12.In the experiment,the performance of the proposed approaches and 9 comparison algorithms are evaluated with the hash code lengths of 16-bit,32-bit,and 64-bit,respectively.The first innovation gains performances improvement more than 1%,3%,and 9% in terms of average m AP on the three datasets.The second innovation gains performances improvement more than 2%,4%,14%,and 6% in terms of average m AP on the four datasets.The experimental results verify the superiority of the proposed methods over existing cross-modal hash retrieval algorithms.

Keywords/Search Tags:

cross-modal hashing, semantic gap, learning to rank, semantic disentanglement, mutual information

PDF Full Text Request

Related items

1	Semantic Transfer Hashing Based On Deep Learning For Cross-modal Retrieval
2	Coupled-hashing For Cross-modal Retrieval
3	Research On Cross-modal Hashing Retrieval Algorithms Based On Latent Semantic Learning
4	Cross-modal Retrieval And Annotation Based On Hashing Learning Method
5	Research On Visual-Semantic Cross-Modal Retrieval Based Hashing Learning
6	Discriminant Cross-modal Hashing Algorithm Based On Semanic Coupling Correlation
7	Research On Cross-Modal Hashing Method Based On Class Semantic Embedding
8	Research On Cross-modal Retrieval Method Based On Deep Semantic Hashing
9	Research On Semantic Preserving And Correlation Mining For Cross-modal Hashing
10	Research On Cross-Modal Retrieval Based Latent Semantic Space Learning