Font Size: a A A

Metric Learning For Open Environment

Posted on:2020-09-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:H J YeFull Text:PDF
GTID:1368330578965568Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Metric learning generates effective representations from data to compare any two objects.Equipped with the learned feature,the similarity or distance between objects reveal their relationship.The metric or the representations facilitate the "downstream"applications a lot.In previous metric learning literature,however,most researchers fo-cus on the stable and closed environment,which requires static features,enough training examples,and can only deal with unitary semantic.While it is the open environment in the real-world applications,which is characterized by "noisy inputs","scarce ex-amples","changing features",and "complex semantics".In this thesis,we propose a framework follows the "input-outpu",perspective for the open environment,which en-ables the metric learning to handle complex open environment both theoretically and algorithmically.Specifically,there are four main parts of this thesis.1.We analyze the generalization ability of metric learning theoretically,and pro-pose two strategies to reduce the training sample complexity.Classical machine learning approaches require a large number of training examples to mimic the true distribution and serves as an input to the model.In some real applications,however,the collection and labeling cost for examples should be taken into account so that only a few training examples can be used.The 3rd chapter analyzes the generaliza-tion ability of metric learning from two perspectives,i.e.,the property of function and metric reuse.Compared with previous results,our analyses improve the con-vergence rate of the generalization gap,which indicates the model working better with a limited number of training examples.Besides,we demonstrate the influence in the theoretical results via lots of synthetic simulations.2.We propose a framework to bridge changing feature spaces between tasks with semantic mapping and only a small amount of training data.In addition to the few-shot problem,to deal with a new task in the open environment,the change of the feature space also increases the difficulty of the model reuse.In the 4th chapter,we propose to reuse the heterogeneous model via linking their feature set relationship in the "meta feature" space.In our REFORM framework,we use optimal transport to transform a well-trained heterogeneous classifier from the previous task to an effective model prior in the current task,which can be further adapted by a limited number of target examples.Two implementations with adaptive scale and tuned transformation are investigated in the experiments on various real applications.It is notable that in the whole model reuse process,no raw data from the previous task is used,which preserves the privacy among different tasks.3.We propose a unified framework to learn an adaptive number of multiple met-rics and discover the rich semantic components from data.Many real-world objects like images and texts contain rich semantics.Existing metric learning meth-ods only utilize single semantic from data to measure the relationship between ob-jects.In the 5th chapter,we first propose the concept of "semantic metric" to deal with the relationship ambiguity,and further introduce a unified multi-metric learn-ing framework UM2L.It not only explores the rich semantics but also facilitates the downstream applications a lot.Besides,to determine the number of metrics to cover localities and semantics,we also propose a multi-metric framework LIFT en-ables allocating an adaptive number of metrics.With the help of the global metric,LIFT avoids overfitting and improves the classification.Experiments on real-world problems verifies the properties of these two frameworks.4.We propose a method to learn robust distance metric and deals with the un-certainty in feature and label spaces by instance perturbation.The open envi-ronment will usually be influenced by noise.First,the features will be disturbed,so that the attributes of objects will not be depicted correctly;On the other hand,the relationship between objects also exists uncertainty,some similar objects will be denoted as dissimilar ones in the dataset.In the 6th chapter,we analyze such noisy environment from a probabilistic perspective,and point out that both two variances come from the feature perturbation.Therefore,we use expected distance to measure the similarity between objects,which takes all kinds of noise distributions into con-sideration.In our DRIFT approach,the "helpful" noise is intelligently introduced in the metric learning process to augment the dataset and improve the robustness of the learned metric.The metric learned by DRIFT has better generalization ability and reveals the true relationship between pairs of objects.
Keywords/Search Tags:machine learning, open environment, metric learning, few-shot learning, multi-semantic, multi-metric, robustness
PDF Full Text Request
Related items