Font Size: a A A

Cross-modal Representation Learning Based On Multi-negatives Supervised Contrastive Mechanism And Its Application

Posted on:2022-08-09Degree:MasterType:Thesis
Country:ChinaCandidate:K X DingFull Text:PDF
GTID:2518306560955089Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the advent of intelligent information age,different modalities of data such as images,videos,texts,audios and so on are increasing on the Internet.These multi-modal datas describe complex living scenes in different forms,and provide useful complementary information for each other.The data of different modalities show obvious heterogeneity in their original form and it is difficult to associate them directly in the semantic level.Therefore,it is necessary to map different modalities to common subspace and reduce the difference between them.Cross-modal representation learning aims to narrow the discrepancy of different modalities at the feature level,establishing the semantic relationship between modalities and expanding the category differences within modalities.Then it can be applied to multi-modal machine learning tasks such as crossmodal retrieval.Nowadays,the contrastive loss promotes the rapid development of representation learning.In previous work,contrastive loss is generally used in the field of self-supervised representation learning.Due to the lack of labels in the self-supervised task,it regards the samples obtained by data enhancement of anchor samples as positive samples,and the rest of the samples in the dataset are negative samples by default.However,the remaining samples are likely to be positive samples which belong to the same category as the anchor samples,which results in the phenomenon of "false negative samples".When the contrastive loss compares the positive samples with the "false negative samples" in the feature space and widens the feature distance between them,the bad feature representation will be acquired.In addition,previous works with contrastive loss has been more concerned with the single modality representation learning,and less were applied to cross-modal representation learning.In order to solve the above two problems,this paper proposes a cross-modal representation learning algorithm(SCCMRL)based on multi-negatives supervised contrastive mechanism by introducing supervised contrastive loss,and constructs the corresponding cross-modal retrieval system based on SCCMRL.Finally,the performance in cross-modal retrieval task is verified on the multi-modal datasets.The experimental results show that the SCCMRL model outperforms the current mainstream cross-modal retrieval model.Specifically,the main contribution of this paper are as follows:1.By creatively applying the multiple-negatives supervised contrastive mechanism to the cross-modal domain,which not only avoids the phenomenon of "false negative samples" in the self-supervised representation learning,but also achieves the cross-modal representation learning.2.A cross modal representation learning algorithm(SCCMRL)based on multinegatives supervised contrastive mechanism is proposed.In this algorithm,the encoder is used to obtain the feature representations of different modalities,and the positive samples are compared with the negative samples by supervised contrastive loss,so that the data with the same semantics are closer and the data with different semantics are more distant in the feature space.In addition,SCCMRL introduces label loss and center loss to further optimize the feature representation learned by itself.Through the combination of three loss functions,the cross-modal feature representation learned by SCCMRL has both modality consistency and semantic discrimination.3.SCCMRL is applied to the task of cross-modal retrieval,and the cross-modal imagetext retrieval and cross-modal audio-visual retrieval are achieved by combining different multimodal datasets.
Keywords/Search Tags:cross modal representation learning, multi-negatives supervised contrastive mechanism, supervised contrastive loss, cross modal retrieval
PDF Full Text Request
Related items