Image And Text Retrieval Method With Fine-grained Semantic Features

Posted on:2023-07-12

Degree:Master

Type:Thesis

Country:China

Candidate:X Xiao

Full Text:PDF

GTID:2558307115987989

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet,a large amount of multimedia content has emerged on the Internet,including multi-modal information such as images and texts,and people’s cross-modal retrieval requirements among these information have followed.In this paper,a generative image and text retrieval model is built on the traditional recurrent neural network and Transformer respectively,and a self-attention feature fusion encoder module is added on the basis of Transformer,and a fine-grained image and text retrieval method based on Transformer is proposed.Aiming at the problem of weak correlation between images and texts in traditional direct retrieval methods,this paper constructs three generative retrieval models to explore the influence of fine-grained semantic features of images on cross-modal retrieval of images and texts.First,in the image and text retrieval model based on the recurrent neural network,the GRU with attention is used to decode the image,and the image and the text are encoded to calculate the similarity;in order to make the image have the ability to generate as close to the real caption as possible,add An image caption loss function optimization model.Secondly,in the image and text retrieval model based on Transformer,a fine-grained interaction model between image and text is introduced,which can directly calculate the similarity between the two features inside the model.Finally,in the fine-grained image and text retrieval model based on Transformer,the pre-trained convolutional neural network is used to extract the global features of the image,and the pre-trained Faster R-CNN is used to map to the target area of the image,and then the local features of the image are extracted;The designed self-attention feature fusion encoder module fuses image global features with local features to enhance the representation of fine-grained semantic information in images.In addition,this paper constructs task scenarios with image-text retrieval and image caption respectively,tests three generative image-text retrieval models on the ICC dataset,and analyzes the results of image caption and image attention distribution visualization.Experiments show that the fine-grained image and text retrieval method based on Transformer proposed in this paper improves the accuracy of image and text retrieval and image caption,and enhances the representation of image target regions and texts.

Keywords/Search Tags:

Image-text retrieval, image caption, fine-grained features, Transformer

PDF Full Text Request

Related items

1	Research And Implementation Of Image Caption Algorithm Based On Feature Extraction
2	Analysis And Implementation Of Fine-grained Image Retrieval Algorithm Based On Deep Convolution Features
3	Transformer-Based Fine-Grained Image Classification Method
4	Research On Discriminative Feature Learning For Fine-Grained Image-Text Retrieval
5	Research On Image Retrieval Methods Based On Vision Transformer
6	Local-global Joint Representative Hashing For Fine-Grained Image Retrieval
7	Research On Fine-grained Image Classification Based On Multi-branch Attention And Fused Multi-level Features
8	Research On Fine-grained Image Caption
9	Fine-grained Image Recognition Based On Deformable Transformer And Multi-Scale Attention
10	Research And Application On Fine-Grained Image Classification Based On Bilinear Model