Semantic Alignment-based Robust Cross-modal Retrieval

Posted on:2024-08-15

Degree:Master

Type:Thesis

Country:China

Candidate:T Zhang

Full Text:PDF

GTID:2568307052495684

Subject:Electronic information

Abstract/Summary:

PDF Full Text Request

In many application contexts,heterogeneous data might produce more desirable outcomes than single-modal data.Multimodal learning has been developing quickly in recent years.As a result,a crucial step in comprehending multimodal data is bridging the semantic gap between different modalities.This work investigates a more robust method to establish semantic alignment between modalities because present models overlook the influence of adversarial attacks in doing so and are susceptible to the influence of adversarial examples during the inference stage.In the meanwhile,this study explores the use of prompt learning in various applications using the visual language pre-training model.This thesis proposes a powerful cross-modal training strategy to address the issue that adversarial examples can easily attack cross-modal learning.Adversarial examples are used as one of the augmentation methods in the data augmentation process,and adversarial training is included into the model training process.We study the adversarial robustness of our robust training method on image-text retrieval based on the cross-modal retrieval task.Analyzed is the effect of adversarial training on different modalities on both intraand inter-modal retrieval.At the image and sentence levels,current cross-modal retrieval methods mostly accomplish coarse-grained semantic alignment.However,on some downstream tasks,finergrained semantic alignment is required.Using prompt learning and visual language pretraining models,we investigate strategies in this study for finding fine-grained regions on images based on text semantics.To achieve the best localization outcomes,it is necessary to first extract entities and relations from natural language text,then optimize the relations with prompt learning to produce semantically more accurate relation graphs,and finally match the relations between fine-grained regions in images to the relations of the text modality.Finally,this thesis investigated the use of prompt learning to defend visual language pre-trained models from adversarial attacks.Humans are unable to visually comprehend adversarial perturbations,but vector-based prompt learning is able to learn from the adversarial examples and automatically create templates that adapt to adversarial perturbations on semantic changes.Adversarial examples cause changes at the semantic level compared to the original samples.The robustness of prompt learning under adversarial examples is validated in this thesis.

Keywords/Search Tags:

Cross-modal learning, Cross-modal retrieval, Adversarial attack, Visual grounding, Prompt learning

PDF Full Text Request

Related items

1	Research On Deep Hashing Method And Security For Cross-Modal Retrieval
2	Research On Cross-Modal Retrieval Based On Deep Semantic Analysis
3	Cross-modal Matching Based On Vision And Language
4	Semantics Consistent Adversarial Cross-Modal Retrieval
5	Research On Cross-Modal Retrieval Algorithm For Similarity Preservation In Deep Adversarial Learning
6	Research On Cross-modal Retrieval Method Based On Adversarial Network
7	Category Alignment Adversarial Learning And Fine-Grained Supplementary Feature Learning For Cross-modal Retrieval
8	Design And Implementation Of DCGAN-based Image-text Cross-modal Retrieval System
9	Research On Cross-Modal Learning Methods For Audio-Visual Association
10	Research On Deep Cross-modal Retrieval Algorithm Based On Representation Learning