Font Size: a A A

Research On Visual Object Recognition In The Framework Of Metric Learning

Posted on:2020-05-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:H J LiuFull Text:PDF
GTID:1368330623458186Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Visual object recognition is one of the basic core topics in computer vision and artificial intelligence community,with great academic value.It also has a wide rang of application prospects and great economic value in all aspects of human social life.Metric learning,learning to measure the similarity of sample pair,is a powerful framework to perform visual object recognition.In this dissertation,we focus on the research of visual object recognition based on the framework of metric learning.Through analyzing these existing metric learning models,we put our attention on the design of loss functions and the re-ranking post-processing technique in the traditional metric learning framework.While in the deep metric learning framework,we concentrate on the construction of deep neural network model and also the design of deep metric loss functions.Then,according to different visual object recognition tasks,based on the specific characteristic of the task itself,we could design the corresponding efficient metric learning models.The main contributions of this dissertation include the following aspects.1.For the parent-child kinship verification task,which should be considered as an asymmetrical metric process due to the effect of “age”,we propose a status-aware projection metric learning(SPML)loss function.SPML takes advantage of two statusspecific projections to capture the significant appearance commonality between parents and children,respectively.The loss function is formulated based on a geometrical metric to measure the similarity of face image pairs.Furthermore,with different arrangements of parameters,the two status-specific projections can be learned respectively or simultaneously during optimization.The experimental results on three kinship verification datasets demonstrate the effectiveness of our SPML compared to those methods with only one Mahalanobis distance metric.2.For the re-ranking post-processing technique in person re-identification task,to address the heavy cross-camera discrepancy existing between query and gallery datasets,we propose a gallery based k-reciprocal-like re-ranking(GKR)method to improve the metric learning.GKR adopts graph matching to construct the matching correspondence between query and gallery datasets.Then the proposed k-reciprocal-like neighbors are computed only on gallery dataset.Moreover,GKR also can be introduced to perform the unsupervised video-based person re-identification,which can improve the cross-camera labels estimating in training step but also can improve the re-identification accuracy by re-ranking in testing step.The experiments show that our GKR truly can improve the performance of metric learning methods.3.For the general image retrieval task,we focus on the research of deep metric loss functions.From the perspective of analyzing the relationship of loss function and pair distances,we unify those existing pair-based loss functions in a general pair-based weighting loss formulation,where the minimize objective loss is just the distances weighting of informative pairs.It includes two main aspects,samples mining and pairs weighting.We detailedly review those existing pair-based losses inline with our general loss function,and explore some possible methods from the perspective of samples mining and pairs weighting.The general formulation can guide us to design loss functions efficiently in a simpler and more direct way.The experimental results on three image retrieval datasets demonstrate the effectiveness of our general pair-based weighting loss formulation compared to those existing pair-based loss functions.4.For the cross-domain person re-identification task,we focus on enhancing the discriminate feature extraction of deep model.The deep model is improved from two aspects of modification,attention mechanism introduction and mid-level features incorporation,to extract discriminative partial person features.We adopt two popular types of self-attention mechanisms,long-range dependency based attention and direct generation based attention.In the manner of directly exploiting a model to new domains,our methods achieve exciting performance of cross-domain person re-identification on three person re-identification datasets,even outperform those methods leveraging the auxiliary information of those target-domain data.5.For the visible-thermal cross-modality person re-identification task,we focus on two knotty problems,the cross-modality discrepancy and intra-modality variations.From the perspective of deep model modification and loss function design,we adopt the twostream CNN structure,mid-level features incorporation and dual-modality triplet loss,to enhance the discriminate feature learning.With the three simple and efficient operations,our methods achieve the best performance on two visible-thermal person datasets,outperforming those existing state-of-the-art methods with a large margin.
Keywords/Search Tags:visual object recognition, metric learning, deep learning, neural network model, loss function
PDF Full Text Request
Related items