Font Size: a A A

Research Of Cross-modal Retrieval Methods Based On Deep Learning

Posted on:2021-05-01Degree:MasterType:Thesis
Country:ChinaCandidate:X C NingFull Text:PDF
GTID:2428330614960384Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Traditional information search methods based on a single modality,such as keyword search,search for images with images,etc.,have gradually failed to meet people's actual needs.How to integrate multi-modal information on the network and correlate semantically consistent information to discover useful information has been a hot issue in the field of multimedia research for a long time.The key to cross-modal retrieval is how to characterize different modal data and how to correlate these multi-modal features.Existing deep learning-based cross-modal retrieval methods usually project data from different modal spaces into the same multi-modal subspace to generate uniformly distributed embedding vectors.The cross-modal retrieval problem is transformed into the similarity ranking problem between the embedded vectors in the subspace.However,the existing deep learning-based cross-modal retrieval method is often flooded with word-level vector mapping of text features,and it is difficult to generate high-level semantic information of text by generating text embedding vectors.In response to this phenomenon,we propose improved cross-modal retrieval methods from two different perspectives.The content of this dissertation is as follows:(1)Aiming at the problem of lack of interaction between image features and text features,we designed a cross-modal retrieval method based on multi-level semantic interaction.Based on the existing cross-modal retrieval method based on visual semantic joint embedding,we carry out multi-level semantic interaction of text word vector features and image feature vectors one by one,and supplement the text feature vectors with interactive information to generate fusion multi-level The text embedding vector of semantic interactive information.The experimental results show that this retrieval method based on multi-level semantic interaction can make the correlation between semantically relevant text features and image features more tight,thereby significantly improving the retrieval effect of the model.(2)Aiming at the problem that sentence-level text embedding vectors tend to ignore the topic information in the image,we designed a cross-modal retrieval model based on topic supplement.First,we initialize an external topic matrix,and then combine the multi-head self-attention mechanism to perform weight processing on each text word vector,and update the text vector and topic matrix accordingly,and then throughthe interaction between the text vector and topic information Function and supplement,generate text embedding vectors that merge deep theme information.The experimental results show that this cross-modal retrieval method based on topic supplementation can better mine the deep topic information of text modals,thereby reducing the interference of redundant information in the text data,thereby significantly improving the retrieval effect of the model.In order to evaluate our proposed cross-modal retrieval model more objectively,we also designed multiple experimental scenarios such as different word vector dimensions,different embedding vector dimensions in subspaces,different interaction levels,and the number of different topics to further explore the effectiveness of the proposed improved cross-modal retrieval model.
Keywords/Search Tags:Cross-modal Retrieval, Multi-hop Interactive, Topic Supplement, Multi-head Self-attention
PDF Full Text Request
Related items