Font Size: a A A

An Optimized Approach To Cross-Modal Retrieval Based On Multi-level Attention Mechanism

Posted on:2022-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:H KongFull Text:PDF
GTID:2518306572497304Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
An important feature of big data is multi-modality.The sources of data in different industries are wide and diverse,such as video,picture,voice and etc.Each of them can be regarded as a modality.Based on these multi-modal data,users can select any media content as query conditions.Therefore,cross-modal retrieval has a very broad application space,which has become a hot issue in current research.The main challenge of cross-modal retrieval is how to bridge the semantic gap between different modalities.The existing cross-modal retrieval methods based on deep learning use deep neural networks to learn the features of each modal,and then project these features to a public space for representation,which often ignore the latent semantic associations within each modal and between different modalities.Considering the shortcomings of existing methods,this paper uses a multi-level attention mechanism to extract fine-grained features of text and image and learn the potential semantic associations between different modalities.The self-attention is used to learn the contextual local information inside the modal,and the guide-attention is used to learn the global interaction information of different modal.The model builds the self-attention unit and the collaborative attention unit into a deep neural network structure,and the coded result of text feature is used to guide the learning process of image features.Besides,image features will affect the meaning of words and the position of the text in the semantic space,this paper proposed a multi-modal adaptive gate in the multi-level attention mechanism to dynamically adjust the representation of text features according to image features,and achieve further integration.Finally,the labels should be applied to train the model,which can be used in the cross-modal retrieval task.Experiments are conducted on three data sets of Wiki Pidea,Pascal Sentence and Nuswide-10 k.The proposed method is verified by comparison experiments and ablation experiments.The results demonstrate that our method is better than the original method,and significantly improves the average accuracy of cross-modal retrieval.
Keywords/Search Tags:Cross-modal Retrieval, Public Space Representation, Muil-level Attention Mechanism, Multi-modal Adaptive Gate
PDF Full Text Request
Related items