Font Size: a A A

Research On Cross-modal Retrieval Method Based On Generative Adversarial Mechanism

Posted on:2022-05-15Degree:MasterType:Thesis
Country:ChinaCandidate:T Y MaFull Text:PDF
GTID:2518306542963779Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Retrieval task has always been a hot topic in academia and industry,the previous work of retrieval task mainly focused on single-modal retrieval,such as retrieve similar images by images.However,Internet is developing very fast nowadays,people frequently use all kinds of software in mobile phones to create a large amount of multi-modal data,such as pictures,texts,videos and voices,etc.,the past single-modal retrieval technology faces people's diversified retrieval demand have gradually become weak.For example,people want to retrieval the related video clips or text descriptions by a picture.So in order to better satisfy people's needs and make better use of these massive multi-modal data resources,cross-modal retrieval becomes very important.Given a query instance of one modality,cross-modal retrieval aims to find the semantically similar instances of other modalities.Although great success has been achieved in the field of cross-modal retrieval in recent years,but the "modality gap" is still a huge challenge in this field.Due to the differences in the distribution and representation of different modal instances,it's impossible to directly measure the similarity between them.Deep learning technology has developed rapidly in recent years,many excellent deep network models have been proposed one after another during this period,generative adversarial network is one of the most representative models.Generative adversarial network is introduced in more and more cross-modal retrieval methods because of its strong ability of fitting real data feature distribution and generating feature representation which has discrimination.In order to reduce the negative effect of "modality gap" on cross-modal retrieval in a certain extent,two cross-modal retrieval methods which combines the idea of generative adversarial are proposed in this thesis,the main contributions as follows:(1)Lots of the existing cross-modal retrieval methods based on generative adversarial mechanism only contain a modal discriminator to determine which modality the generated samples belong to,so the inter-modal invariance cannot be explored fully,which limits the accuracy of cross-modal retrieval.To deal with this puzzle,this thesis proposes a cross-modal retrieval method based on full-modal autoencoder and generative adversarial mechanism.In this method,two parallel full-modal autoencoders are introduced to embed the samples of image modality and text modality into the common space,respectively.Each full-modal autoencoder not only reconstructs to the feature of its own modality,but also reconstructs to the feature of the cross modal.In order to better preserve semantic discrimination in the common space,a classifier is connected between the middle layers of two full-modal autoencoders to map the features in the common space to the label space.For the purpose of exploring the inter-modal invariance,introducing a modal discriminator and two discriminators are added on the basis of this discriminator,which are used to explore the deeper inter-modal invariance.Through adversarial learning,the final learned common space not only preserve the semantic discrimination,but also preserve the inter-modal invariance.(2)Lots of the existing supervised cross-modal retrieval methods don't take full advantage of the semantic discrimination information contained in the labels,especially the cross-modal retrieval methods which deal with multilabel datas,the similarity matrix calculated by the category labels of the samples which can only indicate whether two samples are similar,but can't indicate a more refined similarity relation between two samples.To deal with this puzzle,this thesis proposes combined multi-level semantic relations and attention mechanism for adversarial cross-modal retrieval method.This method calculates a multilevel similarity matrix,which can represent a finer similarity relationship between samples and guide network learning to preserve cross-modal multilevel semantic similarity.In order to better preserve semantic discrimination in the common space,a classifier is connected at the ends of two parallel network branches to predict a label for embedded features of different modalities in the common space.Introduce attention mechanism to give higher weight to the informative parts,so that the network will pay more attention to these parts in the training process.In addition,a generative adversarial mechanism is introduced to explorer inter-modal invariance while preserving semantic discrimination information.
Keywords/Search Tags:generative adversarial mechanism, full-modal autoencoder, cross-modal retrieval, attention mechanism
PDF Full Text Request
Related items