Font Size: a A A

Study On Attention-aware Prototype Learning Joint Correlation Alignment For Cross-modal Retrieval

Posted on:2022-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:S Q ZhuFull Text:PDF
GTID:2518306536463734Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As a basic technology in information retrieval,cross-modal retrieval aims to model the correlation between different modalities of multimedia dada,and it is about precisely using data of one type to retrieve the semantically relevant data of multi-modality.Compared to the uni-modality retrieval paradigm,cross-modal retrieval can provide users with more diverse results to enrich their perceptions.In this thesis,we concentrate on cross-modal retrieval based on real-valued representation learning,and strive to tackle three key problems as follows: semantic discrimination of mapped feature representations,semantic correlation across different modalities of data and scalability of the model.In view of these,we propose an attention-aware prototype learning joint correlation alignment for cross-modal retrieval method(APLCA),which aims to learn the feature representations with semantic consistency,intra-class compactness and inter-class sparsity to improve the retrieval performance.The main contributions and contents of this thesis are shown as follows:(1)Aiming at the insufficient discrimination of mapped feature representations,an attention-based prototype learning is proposed to extract the strong discriminative features.In order to alleviate the impact of noisy instances,the instance-level attention is used to initialize the prototype representations,and we use the prototype network to learn a metric space to learn the representations,with intra-class compactness and inter-class sparsity,which can enhance the semantic discrimination of mapped features.(2)To model the semantic correlation across different modalities of data effectively,we propose a deep semantic correlation alignment method,which can be embedded into the fully connected layer of neural network seamlessly.Besides,it is about minimizing the difference of second-order statistics between heterogeneous data,which can align the data distributions among different modalities,meanwhile,the distance across different modalities of data,which are with the identical semantics,can be decreased to guide the network to generate modality-invariant features with semantic consistency.(3)Aiming at the poor scalability of the model in real scenarios,this thesis take good use of prototype network based on instance-level attention,which is utilized to initialize the prototype from a small number of unseen category.And we fine-tune the network to extract the discriminative features of data,so that the model not only achieves good retrieval performance in the original dataset,and it also works well on unseen classes,which verifies the good scalability of the model.(4)We conducted multi-angle experiments on three widely-used datasets Wikipedia,Pascal Sentence and NUS-WIDE-10 k in cross-modal retrieval,whose results verify the validity and scalability of APLCA.In addition,by comparing with the state-of-the-art methods,we evaluated the performance of various approaches on different retrieval tasks.The experimental results show that APLCA in this thesis achieves the best performance.
Keywords/Search Tags:Cross-modal Retrieval, Prototype Learning, Attention Mechanism, Correlation Alignment
PDF Full Text Request
Related items