Font Size: a A A

Cross-modal Metric Learning For Heterogeneous Face Recognition

Posted on:2018-11-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:J HuoFull Text:PDF
GTID:1318330512490774Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Heterogeneous face recognition deals with matching face images from different modalities or sources.For example,matching near-infrared face images to visible light face images,sketches to photos,low resolution to high resolution face images,etc.This thesis mainly focused on studying cross-modal metric learning for heterogeneous face recognition.The goal is to learn metrics to remove modality variations of heterogenous face representations,so that cross-modal intra-personal and cross-modal inter-personal distances are separable.Four new methods are proposed and the details are as follows:(1)A Margin Based Cross-Modal Metric Learning(MCM2L)is proposed.In heterogenous face recognition,the features are usually highly influenced by modality variations,making intra-personal and inter-personal cross-modal distances inseparable.Therefore,a margin based cross-modal metric learning method is proposed.Specifi-cally,a cross-modal metric is defined in a common subspace where samples of two dif-ferent modalities are mapped and measured.The objective is to learn such metrics that satisfy the following two constraints.The first minimizes pairwise intra-personal cross-modal distances.The second forces a margin between subject specific intra-personal and inter-personal cross-modal distances.It allows the proposed method to focus more on optimizing distances of those subjects whose intra-personal and inter-personal dis-tances are hard to separate.The proposed method is further extended to a Kernelized Margin Based Cross-Modal Metric Learning(KMCM2L)method.They have been e-valuated on three cross-modal datasets.In extensive experiments and comparisons with the state-of-the-art methods,the MCM2L and KMCM2L methods achieved marked im-provements in most cases.(2)A cross-modal metric learning method is proposed(Cross-Modal Metric Learn-ing for AUC Optimization,CMLAUC).The existing methods mostly focus on minimiz-ing the loss defined on sample pairs.However,the numbers of intra-class and inter-class sample pairs can be highly imbalanced in heterogeneous face recognition,and this can lead to deteriorating or unsatisfactory performances.The Area Under the ROC Curve(AUC)is a more meaningful performance measure for imbalanced distribution problems.To tackle the imbalanced distribution problem as well as to make samples from different modalities directly comparable,a cross-modal metric learning method is presented by directly maximizing AUC.An extended version of the method can fur-ther focus on optimizing partial AUC(pAUC),which is the AUC between two specific false positive rates.This is particularly useful in certain applications,where only the performances assessed within predefined false positive ranges are critical.The pro-posed method is formulated as a Log Determinant(LogDet)regularized semi-definite optimization problem.For efficient optimization,a mini-batch proximal point algo-rithm is developed.Several datasets are adopted in evaluation,including three datasets on cross-modal face recognition under various scenarios and one single-modal face recognition dataset.Results demonstrate the effectiveness of the proposed methods and marked improvements over the existing methods.Specifically,pAUC-optimized cross-modal metric learning proves to be more competitive for performance measures such as Rank-1 and VR@FPR=0.1%.(3)A method for learning an Ensemble of Sparse Cross-Modal Metrics(ESPAC)is proposed.The performance of heterogeneous face recognition is not only high-ly influenced by modality inconsistency,but also appearance occlusions,illumination variations,expressions and so on.A new method named as ensemble of sparse cross-modal metrics is proposed for tackling these challenging issues.In particular,a weak sparse cross-modal metric learning method is firstly developed to measure distances between samples of two modalities.It learns to adjust rank-one cross-modal metrics to satisfy two sets of triplet based cross-modal distance constraints in a compact for-m.Meanwhile,a group based feature selection is performed to neglect features that attribute to "noise"(occlusions,illumination variations,expressions,etc).Finally,an ensemble framework is incorporated to combine the results of differently learned sparse metrics into a strong one.Extensive experiments on various face datasets demonstrate the benefit of such feature selection especially when heavy occlusions exist.The pro-posed ensemble metric learning has been shown superiority over several state-of-the-art methods in heterogeneous face recognition.(4)A Variation Robust Cross-Modal Metric Learning(VR-CM2L)method is pro-posed.This method is specifically designed for caricature recognition.Caricature recognition is a special case of heterogeneous face recognition.There are various vari-ations within this recognition process,where caricature related variations include fa-cial appearance exaggerations,change of artistic styles,etc.Other variations include view changes,expressions,illuminations,etc.All these variations lead to sever mis-alignment between features of caricatures and photos.To deal with the problem,a variation robust cross-modal metric learning method is proposed.Specifically,a het-erogeneous facial landmark based feature extraction scheme is proposed.At each fixed facial landmark,features of photos are extracted with fixed view and scale,while fea-tures of caricatures are extracted with different scales and view angles.To measure the similarity of these features,cross-modal metric is learned at each facial landmark,where pooling at distance level is used to align the features of caricatures and photos.All the cross-modal metrics at different facial landmarks are learned in one optimiza-tion framework to guarantee global optimal.Experimental results on two caricature datasets demonstrate the effectiveness of the proposed method with various variations.Besides,results show that the heterogeneous face feature extraction scheme together with the proposed VR-CM2L can achieve better results compared with homogenous feature extraction based methods.
Keywords/Search Tags:Face Recognition, Cross Modal, Metric Learning, AUC Optimization, Sparse Learning, Feature Selection, Ensemble Learning
PDF Full Text Request
Related items