Font Size: a A A

Research On Metric Learning Based Support Vector Machine Algorithm And Its Applications

Posted on:2022-08-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y B RuanFull Text:PDF
GTID:1488306317494174Subject:Computer applications engineering
Abstract/Summary:PDF Full Text Request
Distance metric learning can learn an appropriate distance metric for complex datasets.By using the distance metric to process the data distribution,we can reduce the distance between similar samples and enlarge the distance between dissimilar samples.Most of the existing distance metric learning methods are considered as the nearest neighbor distance metric learning methods and adopt the kNN model to predict instances.However,the k-nearest neighbor model has limitations.For example,for each test sample classification,it needs to access and store all training samples to classify the test samples,and the setting of the nearest neighbor number k will affect the classification performance.Different from the kNN model,support vector machine is a non-nearest neighbor model,which can learn a separating hyperplane to classify the test samples and can avoid the influence of the nearest neighbors of the test samples.But,most of the existing support vector machine methods adopt Euclidean distance metric to measure the distance between samples.It is known that the Euclidean distance only considers the similarity of all the features between the samples,and can not use the labeling information between the samples.For the samples,which have different labels and are close to each other,Euclidean distance based support vector machine would classify them to the same class,which may lead to misclassification.These support vector machine models cannot automatically learn an appropriate distance metric according to the classification task to use the appropriate feature information to calculate the distance between instances.In view of the advantages and disadvantages of metric learning and support vector machine,we combine metric learning and support vector machine to propose several related algorithms for metric learning based support vector machine model.The main contents of this paper are as follows:(1)A convex model for distance metric learning based support vector machine is proposed.The convex classification model CSV-DML proposed in this paper can train the metric learning model and the support vector machine classification model in a unified framework,and is different from the existing non-convex metric learning based support vector machine model.CSV-DML is convex and can obtain the global optimal solution.In order to make full use of most kernel functions of existing support vector classification methods,the original samples are mapped to a specific high-dimensional feature space by nonlinear mapping.Because the explicit form of the sample after nonlinear mapping is unknown,the original sample is further transformed into kernel form,so that the sample in kernel form can be calculated explicitly.In order to solve the convex optimization model proposed in this paper,an iterative optimization method based on generalized block coordinate descent is proposed,which can converge to the global optimal solution.In the CSV-DML model,because the dimension of instance in kernel form is only related to the number of original training samples,a specific parameter reduction method is designed,which can be used to reduce the feature dimension of samples.(2)A nearest neighbor search model for distance metric learninh is proposed.The proposed nearest neighbor search model NNS-DML can construct metric optimization constraints by searching different optimal nearest neighbor numbers for different training samples.Specifically,we develop a nearest neighbor search matrix to include the nearest neighbor correlations of all training samples.Using the search matrix,the metric optimization constraints of each training sample can be constructed and weighted,such that reduce the influence of its irrelevant features on the corresponding pairs of similar and dissimilar samples can be reduced.By solving the single objective optimization problem proposed in this paper,the search matrix and distance metric matrix in the NNS-DML classification model can be learned together.In addition,we use the support vector machine solver to build a k-free nearest neighbor classification model,which can ignore the setting of the nearest neighbor number k.Experimental results show that the combination of nearest neighbor search model and support vector classification model can improve the classification performance of support vector classification model.(3)A support vector machine based multi-task multi-instance distance metric learning model is proposed.The learning performance of the classifier may be limited by a small number of training samples.This paper proposes a multi-task support vector machine based multi-instance metric learning classification model,which can learn these related classification tasks at the same time,and combine the classification information shared between tasks to improve the classification performance.In order to make the data of different classes far away from each other and the data of the same class more compact,a class specific Mahalanobis distance for multi-task learning is constructed.In addition,in order to reflect the relative importance of the samples and their classes in the multi-instance bag,a significance parameter for multi-task learning is constructed.By constructing a multi-task multi-instance metric learning problem and solving the problem,the support vector machine based multi-instance decision function can be obtained for each classification task.Compared with the traditional multi-instance distance metric learning method,MT-MIDM can optimize several related classification models at the same time,thus using the feature information between models to improve the classification performance.Different from the existing multi-task multi-instance methods,MT-MIDM does not use Euclidean distance metric,but uses Mahalanobis distance metric with multiple-task multiple-instance setting to deal with data distribution,which makes the samples between different classes more separable.
Keywords/Search Tags:Distance metric learning, Support vector classification, Nearest neighbor classification, Multi-instance learning, Multi-task learning
PDF Full Text Request
Related items