Font Size: a A A

Research And Application Of Multi-Layered Semantic Alignment Cross Modal Retrieval

Posted on:2024-06-04Degree:MasterType:Thesis
Country:ChinaCandidate:Z W MaFull Text:PDF
GTID:2558307127961059Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of social media,various types of data,such as text and pictures,spread rapidly in the network.How to obtain meaningful information from these multimodal data is particularly important.Cross-modal retrieval can collect multimodal information to meet the needs of users to obtain information.It is an important research direction with excellent development potential and significance in multimodal learning,artificial intelligence,and other fields.Scholars have carried out a series of research in this direction and achieved excellent results.However,narrowing the semantic gap between modes,removing redundant features,and maintaining the balance between retrieval efficiency and accuracy is still important to challenge crossmodal retrieval.Given the above problems,this paper proposes a cross-modal retrieval method based on multi-level semantic alignment.The main research contents are as follows:(1)We proposed a cross-modal retrieval method based on counterfactual reasoning which can reduce the semantic gap and remove redundant features.This method applies the multi-level contrastive learning method based on counterfactual reasoning to crossmodal retrieval.It constructs counterfactual contrastive learning at the instance level,image level and semantic level respectively,so that we can train the model to comprehensively understand the image and text expression and reduce the semantic gap between modes.At the same time,counterfactual thinking is introduced in constructing positive and negative samples of comparative learning to generate facts and counterfactual samples with intense discrimination to improve the perception of the model.(2)Aiming to maintain the balance of retrieval efficiency and accuracy,we propose a semantic-enhanced cross-modal retrieval method based on a dual stream Transformer.This method uses a dual stream Transformer with high retrieval efficiency to build the model,and a multi-selection task and a semantic alignment module are proposed.Among them,using self supervised Learning,the multi-selection task enables the model to mine the semantic associations between image text features;The semantic alignment module is used to implement multiple selection tasks.This module can align images and text semantically at multiple levels,enabling the model to improve model accuracy while maintaining retrieval efficiency.(3)Cross-modal retrieval prototype system.The system realizes the above two cross-modal retrieval methods based on multi-level semantic alignment by building front and back pages and shows the retrieval process of two tasks image retrieval text and text retrieval image.To sum up,the text constructs a cross-modal retrieval method based on counterfactual reasoning,which provides a solution to the problem of narrowing the semantic gap and removing redundant features in the cross-modal retrieval model;A semantically enhanced cross modal retrieval method based on dual stream Transformer is proposed,which can effectively improve the accuracy of model retrieval while maintaining the efficiency of model retrieval;A prototype system of cross-modal retrieval is designed and implemented,which makes a beneficial exploration and attempt in the field of multi-level semantic aligned cross-modal retrieval.
Keywords/Search Tags:Cross-modal retrieval, Comparative learning, Semantic alignment, Counterfactual reasoning, Multi-modal learning
PDF Full Text Request
Related items