Research And Application On Cross-Modal Retrieval Methods For Image-Text

Posted on:2024-04-30

Degree:Master

Type:Thesis

Country:China

Candidate:M Y Li

Full Text:PDF

GTID:2568307157482964

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Cross-modal image-text retrieval can establish connections between images and text to achieve information interaction and sharing between image and text modalities,and its core goal is to eliminate the heterogeneous gap between different modalities.Mainstream approaches usually detect image regions first and then find the association between image regions and text words.However,these methods mainly focus on the features of salient image regions,rarely consider the correlation between different regions of images,and fail to make full use of image region attribute labels,thus failing to capture more detailed semantic information in images and texts,resulting in insufficient accuracy of graphical text retrieval.To address the above problems,the following research is done in this paper:In this paper,we propose a cross-modal image-text retrieval method based on intra-modal feature enhancement,which further explores the deep intra and inter modal relationships between image and text modalities.The method consists of three main parts:image feature extraction and processing,text feature extraction and processing,and cross-modal interaction.Firstly,the significant regions in the image are detected,the significant region features and the region attribute label features are extracted,and the two are fused to obtain a comprehensive representation of the image,then the association relationship between different regions of the fused image is constructed using a graph convolutional network to achieve semantic enhancement of the significant regions of the image;then the text word feature representation is obtained using BERT and Bi-GRU;finally,the image and text features are cross Finally,the image and text features are interacted across modalities and the similarity between the two is inferred using a stacked cross-attention mechanism.The experimental results on two benchmark datasets,MSCOCO and Flickr30 k,show that the proposed method effectively improves the accuracy of image-text retrieval and validates the effectiveness of the method.In addition,a cross-modal image-text retrieval system is designed and implemented based on the proposed method.The system is equipped with the core functions of "search by text" and "search by image",which receives query samples,invokes the model to perform cross-modal image and text retrieval,and presents the results to users in a friendly way in a visual way.It has been tested and verified that it can effectively meet the demand of cross-text search and has practical application value.

Keywords/Search Tags:

Cross-modal retrieval, graph convolutional network, attention mechanism

PDF Full Text Request

Related items

1	Research And Application On Cross-Modal Retrieval Methods For Image-Text
2	Research On Cross-modal Hashing Retrieval Based On Deep Feature Learning
3	Visual-textual Cross-modal Retrieval Based On Multimodal Information Interaction
4	Jointly Cross-and Self-modal Graph Attention Networks For Query-based Moment Retrieval In Videos
5	Research On Key Techniques Of Cross-Modal Retrieval
6	Research On Fine-grained Cross-modal Retrieval Algorithm Based On Graph Model And Transformer
7	Heterogeneous Graph Hashing For Cross-Modal Audio-Image Retrieval
8	Research On Binary Hashing Method For Cross-modal Information Retrieva
9	An Optimized Approach To Cross-Modal Retrieval Based On Multi-level Attention Mechanism
10	Research On Cross-modal Hashing Method Based On Graph Convolutional Neural Network