Font Size: a A A

Research And Applications Of Robust Distance Metric Learning In The Presence Of Label Noise

Posted on:2019-11-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:D WangFull Text:PDF
GTID:1368330590466694Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Distance metric learning(DML),as one of the most popular machine learning algorithms,has been widely used in many real-world applications such as object recognition,face verification and image retrieval.Most DML algorithms aim to learn a mapping to project original data into a new space which has some desirable properties.For example,in the new space we require pulling together those semantically similar data points while pushing away those dissimilar ones.Although DML has achieved great success,most of the work assumes that the class labels of data are clean.However,many real-world datasets,especially those collected via Internet,contain a number of error labels,and thus the performance would be heavily deteriorated.This is mainly due to that label noise can mislead DML models to pull together dissimilar data points and push away those similar ones,therefore training with label noise needs more iterations and probably converges to a very bad local solution.This work mainly focuses on developing robust and efficient DML algorithm to deal with label noise corrupted real-world datasets.Specifically,the main contributions of this thesis can be summa-rized as follows.(1)An effective pre-processing approach for label noisy data is proposed,which includes an unsu-pervised feature learning network—C-SVDDNet and a label-denoising network—LDAE.(2)A latent variable probabilistic DML algorithm is proposed,in which the true labels of data are modeled as latent variables and EM algorithm is employed to iteratively estimates the latent variables and the model parameter.(3)A variational Bayes based robust DML algorithm—Bayesian NCA is proposed.This graph constrained model can better exploit the structure information of data than pairwise constrained models.Furthermore,for efficient Bayesian inference,a fixed curvature variational lower bound for the log-likelihood is proposed,with which the training cost would be largely reduced.(4)A Bayesian large margin DML algorithm—Bayesian LMNN is proposed,which is a Bayesian extension of standard LMNN by employing stochastic variational Bayesian inference for parameter estimation.Furthermore,we theoretically show the robustness of this method against label noise and derive its generalization error bound and sample complexity.When training data is corrupted with label noise,Bayesian LMNN usually has smaller generalization error than standard LMNN.
Keywords/Search Tags:distance metric learning, feature representation, label noise, overfitting, robustness, EM, variational B ayes, generalization error
PDF Full Text Request
Related items