Research And Application Of Interactive And Graph Matching Image-text Retrieval

Posted on:2024-01-01

Degree:Master

Type:Thesis

Country:China

Candidate:D Z Zheng

Full Text:PDF

GTID:2568307079472034

Subject:Electronic information

Abstract/Summary:

As multimedia data explosively grows,people are gradually surrounded by various modality data such as images,videos,texts,audios,etc.People generate multimodal data constantly,which accelerates the application of cross-modal retrieval.Although a considerable progress has been made in cross-modal retrieval research,finding heterogeneous data with content relevance still remains challenging.This thesis studies image-text crossmodal retrieval using interactive methods and graph matching methods,and the main research contents are as follows:1.A semantic filtering and adaptive pooling-based image-text retrieval model is proposed.This thesis uses cross-attention mechanism to realize cross-modal interaction between images and texts,and implements a semantic filtering module that utilizes matching information and local similarity of image-text pairs to reduce the attention weights assigned to fragments in mismatched image-text pairs,aligning image-text fragments with relevant semantics.When aggregating local features to obtain global features,the traditional mean or max pooling cannot achieve optimal results.Therefore,this thesis implements a learnable pooling module that adapts to different feature forms and adjusts the pooling method adaptively,aggregating local features to obtain global features.This thesis conducted experiments on the Flickr30 K and MSCOCO datasets,and the results showed that compared with existing image-text retrieval models,the proposed model based on semantic filtering and adaptive pooling improves retrieval accuracy.2.A graph matching-based image-text retrieval model is proposed.Firstly,this thesis uses salient regions in images and words in texts to model graph nodes,and then utilizes graph convolutional networks to infer intra-modality relationships of graph nodes and extract intra-modality correlations.Secondly,a cross-modal feature extraction method is introduced,and the matching information between image regions and words is allowed to flow through the graph,extracting features that contain cross-modal matching information and fully utilizing intra-modality and inter-modality information.Finally,graph structure matching and image-text global similarity calculation are performed,and different levels of image-text matching relationships are learned using graph structure matching information and global similarity information.This thesis conducted experiments on the Flickr30 K and MSCOCO datasets,and the results showed that compared with existing graph matching-based image-text retrieval models,the proposed model based on crossgraph structure matching improves retrieval accuracy.3.This thesis designs and implements a cross-modal retrieval system,which provides cross-modal retrieval services through the architecture of browser and server.The cross-modal retrieval model of the system adopts the method proposed in this thesis,and can perform cross-modal retrieval according to the images or texts uploaded by users,providing the functions of image retrieval by text and text retrieval by image.

Keywords/Search Tags:

Image-text Retrieval, Semantic Filtering, Adaptive Pooling, Graph Matching, Attention Mechanism

Related items

1	Research On Image-Text Retrieval Algorithm Based On Semantic Reasoning
2	Research On Text And Image Matching Algorithm Based On Deep Learnin
3	Text Matching Based On Graph Neural Network And Attention Mechanism
4	Research On Image-Text Cross-Modal Matching Based On Attention Mechanism
5	Attention Mechanism Based Cross-Modal Semantic Alignment
6	Research On Image-Text Matching Based On Deep Learning
7	Research On Long Text Semantic Matching Based On Graph Convolution Neural Network
8	Research On Chinese Document Matching Algorithm
9	Image Semantic Segmentation Based On Pyramid Pooling And Attention Mechanism
10	Research On Image Semantic Segmentation Method Based On Attention Mechanism