An Optimized Approach To Cross-Modal Retrieval Based On Multi-level Attention Mechanism

Posted on:2022-01-12

Degree:Master

Type:Thesis

Country:China

Candidate:H Kong

Full Text:PDF

GTID:2518306572497304

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

An important feature of big data is multi-modality.The sources of data in different industries are wide and diverse,such as video,picture,voice and etc.Each of them can be regarded as a modality.Based on these multi-modal data,users can select any media content as query conditions.Therefore,cross-modal retrieval has a very broad application space,which has become a hot issue in current research.The main challenge of cross-modal retrieval is how to bridge the semantic gap between different modalities.The existing cross-modal retrieval methods based on deep learning use deep neural networks to learn the features of each modal,and then project these features to a public space for representation,which often ignore the latent semantic associations within each modal and between different modalities.Considering the shortcomings of existing methods,this paper uses a multi-level attention mechanism to extract fine-grained features of text and image and learn the potential semantic associations between different modalities.The self-attention is used to learn the contextual local information inside the modal,and the guide-attention is used to learn the global interaction information of different modal.The model builds the self-attention unit and the collaborative attention unit into a deep neural network structure,and the coded result of text feature is used to guide the learning process of image features.Besides,image features will affect the meaning of words and the position of the text in the semantic space,this paper proposed a multi-modal adaptive gate in the multi-level attention mechanism to dynamically adjust the representation of text features according to image features,and achieve further integration.Finally,the labels should be applied to train the model,which can be used in the cross-modal retrieval task.Experiments are conducted on three data sets of Wiki Pidea,Pascal Sentence and Nuswide-10 k.The proposed method is verified by comparison experiments and ablation experiments.The results demonstrate that our method is better than the original method,and significantly improves the average accuracy of cross-modal retrieval.

Keywords/Search Tags:

Cross-modal Retrieval, Public Space Representation, Muil-level Attention Mechanism, Multi-modal Adaptive Gate

PDF Full Text Request

Related items

1	Cross-modal Retrieval Method Based On Dependence Relationship Attention And Social Information
2	News Event Search System Based On Cross-modal Semantic Representation Consistency
3	Cross-modal Retrieval Based On Deep Model Learning
4	Research On Cross-modal Retrieval Method Based On Generative Adversarial Mechanism
5	Audio-Video Based Cross-modal Speaker Retrieval And Recognition
6	Research Of Cross-modal Retrieval Methods Based On Deep Learning
7	Research On Image-Text Cross-Modal Matching Based On Attention Mechanism
8	Image-text Translation Based On Cross-modal Related Semantics And Attention Mechanism
9	Research On Social-Sensed Cross-Modal Retrieval
10	Research On Relevance Computation Of Cross-modal Retrieval