Font Size: a A A

Deep Metric Learning For Cross-Modal Retrieval

Posted on:2022-12-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:J W WeiFull Text:PDF
GTID:1488306764460104Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of mobile internet and social media applications,multimedia data grows rapidly.Multimedia data formed by image,video,audio and text are multi-source heterogeneous in form and interrelated in semantics.How to effectively organize,manage and retrieve massive multimedia data and free users from the limitation of media types are current hot issues in industry and academia.Cross-modal retrieval aims to search the semantically similar instances from the other modalities given a query from one modality.Cross-modal retrieval can help users retrieve interesting content from the large amounts of multimedia data.The core idea of cross-modal retrieval is to accurately calculate the similarity between samples from different modalities,which is usually driven by metric learning.The performance of cross-modal retrieval relies on the ability of deep metric learning to mine and weight informative pairs.However,both the heterogeneity and imbalanced distribution of cross-modal data pose new challenges for deep metric learning.First,in the face of complex and diverse cross-modal retrieval scenarios,it is urgent to develop a taskindependent general metric learning method design criterion to provide theoretical guidance for the design of new metric learning methods.Second,the large amount of noise contained in multimedia data poses a great challenge to the generality and robustness of metric learning algorithms in practical applications.Finally,traditional metric learning methods inevitably involve hyper-parameters,which usually require fine-tuning on the validation set and consume a lot of computational time and resources.How to develop new algorithms to automatically learn hyper-parameters is a pressing challenge at present.To address these issues,this dissertation makes the following novel contributions:1.To address the challenge of lacking guidelines for the design of generic metric learning methods,we propose a novel universal weighting metric learning framework.The framework associates the similarity score with the weight value through an abstract weighting function.We give the constraints that the weighting function should satisfy.This framework unifies existing weighted loss functions and provides a powerful tool for analyzing the interpretability of various weighted loss functions.It also provides a theoretical basis for designing a novel weighted loss function.We analyze existing weighted loss functions in detail and conducted extensive experiments to evaluate the performance of existing methods in cross-modal scenarios.2.To address the challenge of the lack of universal metric learning methods in the field of cross-modal retrieval,we design a novel self-similarity polynomial loss function based on the proposed universal weighting metric learning framework.It provides a novel polynomial function to associate the weight value of the sample pair with its similarity score,which can adaptively assign the appropriate weight value to the sample pair during training.Benefiting from the general approximation ability of polynomial functions,the proposed loss function can fit a lot of weighting functions.Self-similarity polynomial loss can be generally applied to existing state-of-the-art cross-modal retrieval methods.Keeping the network structure unchanged,and retraining the network with our loss function can significantly improve the convergence speed and retrieval performance of the method.3.Based on the universal weighting metric learning framework,we develop a novel relative-similarity polynomial loss for cross-modal retrieval.The relative-similarity polynomial loss retains the general approximation ability of the self-similarity polynomial loss and greatly reduces the number of hyper-parameters in the loss function,which is easier to find the optimal parameter combination in practical scenarios.The relative-similarity polynomial loss function can be generally applied to cross-modal retrieval methods and further improve their convergence speed and retrieval performance.Extensive experiments show that the relative-similarity polynomial loss function outperforms the selfsimilarity polynomial loss.4.To address the challenge of hyper-parameter setting in existing deep metric learning methods,we introduce a novel meta self-paced network that automatically learns the optimal weighting mechanism from data.Meta self-paced network can be generally applied to a variety of cross-modal scenarios,saves a lot of hyper-parameter fine-tuning time,and solves the problem of hyper-parameter setting in the cross-modal metric learning.
Keywords/Search Tags:Deep metric learning, universal weighting metric learning framework, polynomial loss, meta self-paced learning, cross-modal retrieval
PDF Full Text Request
Related items