Font Size: a A A

Research On Cross-modal Retrieval Method Based On Supervised Deep Neural Network

Posted on:2022-07-11Degree:MasterType:Thesis
Country:ChinaCandidate:X K LuoFull Text:PDF
GTID:2518306557967299Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the vigorous development of computer science and artificial intelligence,we have entered the era of big data.Video softwares,office softwares and reading softwares for daily use have generated massive amounts of data,which are mostly in the form of images,text,video,and audio.In the context of big data,users not only need to search between single modal data,but also need more flexible and changeable search engine.For example,users need to retrieve data of image modality from text modality.Research on cross-modal retrieval has gradually become a research hotspot of scholars.Scholars have proposed a variety of cross-modal retrieval methods based on supervised deep neural networks.These methods can project data of different modalities into a unified feature space through non-linear mapping methods to mine the semantic information of multi-modal data and enhance the semantic relevance of multi-modal data.However,how to effectively eliminate the difference of modality while retaining the modal information as much as possible,and fully mining the discriminative features that are beneficial for retrieval has not been effectively studied.Based on the problems above,this paper proposes three cross-modal retrieval methods based on supervised deep neural networks:1.In order to reduce the difference of modality while better retaining the original information of modality,Cross Modal Retrieval via Dual Adversarial Autoencoders is proposed,referred to as DAA for short.The global adversarial network is used to improve the data reconstruction process of the autoencoders.The min-max game can be implemented to make it difficult to distinguish the original features and reconstruction features.In addition,the hidden layer adversarial network can generate modality-invariant representations,making the inter-modal data be indistinguishable from each other,and effectively reducing the distribution differences of multi-modal data.2.In order to mine the relevance of paired cross-modal data while better retaining the original information of modality,Discriminative Cycle Generative Adversarial Network for Cross-modal Retrieval is proposed,referred to as DCycle GAN for short.One modal data generates another modal data by generative adversarial network,and then the generated data is as the input of the other generative adversarial network to realize the cyclic generation of data.The network continuously learns the semantic relevance between cross-modal data and retains the original information of modality.In addition,the deep metric loss is defined to guide the extraction of discriminative features and ensure the similarity of data shared the same label and the difference of data shared different labels.3.In order to reduce the difference between the modalities while retaining the modal information,and improve the cross-modal retrieval efficiency,Modality-specific and Shared Feature Learning for Cross-modal Hashing is proposed,referred to as MSSFL for short.This method combines the adversarial autoencoder with hash learning.The autoencoder is used to learn the specific and shared information of the two modalities,which are mapped into a Hamming space to obtain their hash codes through the sign function.Specific and shared features are used for modeling so that the output hash codes have better semantic distinction between and within modalities,thereby improving retrieval efficiency.Experiments are performed on Wikipedia and NUS-WIDE-10 k datasets,which are widely used benchmark multi-modal datasets.Comparative experiments with some classic and recent methods under the same experimental settings proved the proposed DAA,DCycle GAN,MSSFL have certain effectiveness and feasibility.
Keywords/Search Tags:cross-modal retrieval, supervised learning, deep learning, autoencoder, generative adversarial network
PDF Full Text Request
Related items