Font Size: a A A

Research On Cross-modal Retrieval Method Based On Adversarial Network

Posted on:2021-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:F ShangFull Text:PDF
GTID:2428330602464582Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With rapid advances in communication and Internet technology,there has an explosive increase of multi-modal data.Massive amounts of multi-modal data not only facilitates users,but also poses new challenges to information retrieval technology.In order to better satisfy users'requirements for modal data retrieval,in the meantime,make the computer has the ability to simulate the cognition,learning and decision-making processes of multi-modal data,cross-modal retrieval is applied as the times require.Deep neural network(DNN)is similar to multi-layer nonlinear projection,which has stronger mapping ability than shallow model.It can fully extract multi-level abstract representations of different modalities.In particular,generative adversarial network(GAN)can effectively fit the distribution of multi-modal data and better learn the shared representations of different modal data.This paper integrates the ideas of dictionary learning,metric learning and dual subspaces on the basis of adversarial network,effectively captures the structural information and semantic information of multi-modal data,and well eliminates the heterogeneous gap and semantic gap.The main works and contributions are as follows:1.This paper proposes a Semantic Consistency cross-modal Dictionary learning algorithm with rank Constraint(SCDC)method,which integratesl21-norm and rank constraint into dictionary learning.Then,we introduce the generative adversarial mechanism and propose a Adversarial Cross-Modal Retrieval Based on Dictionary Learning(DLA-CMR)method,which utilizes dictionary learning to reconstruct discriminative features,and takes advantages of adversarial learning to mine the complex statistical characteristics of multi-modal data.Specifically,this method constructs two antagonists,called feature preserving and modality classification.The former ensures that the transformed features(features projected into the common space)have the maximum correlation while maintaining their own modality inherent statistical characteristics,effectively eliminating the heterogeneous gap.The latter is essentially a binary classifier that can predict the original modality of the transformed features.The purpose of feature preserving and modality classification is opposite.They constantly fight and improve,and finally learn a common space to effectively cross the heterogeneous gap and semantic gap.2.This paper proposes a cross-modal Dual Subspace learning with Adversarial Network(DSAN)method,which considers dual subspaces,metric learning and adversarial learning simultaneously.In particular,dual subspaces can effectively mine the structural information of different modal data and make full use of modality-specific information.We propose an improved quadruplet loss,which considers both relative distance and absolute distance,pushes the boundaries of positive and negative samples to some extent.Meanwhile we introduce the hard sample mining,which effectively reduces the complexity and improve the performance of the model.We propose an intra-modal constrained loss,which maximizes the distance of the closest cross-modal negative instances and the corresponding cross-modal positive instances.In addition,this method can make the different modal data learn better feature representations in dual subspaces through adversarial learning,so as to effectively improve the accuracy of cross-modal retrieval.
Keywords/Search Tags:Cross-modal retrieval, Adversarial network, Dictionary learning, Metric learning, Deep learning
PDF Full Text Request
Related items