Font Size: a A A

Feature Weighting And Distance Metric Learning For Multiple-Instance Classification

Posted on:2018-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:X R GuoFull Text:PDF
GTID:2348330536966290Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
In the machine learning,multiple-instance learning is a research hotspot and is considered to be the fourth machine learning framework.It is widely used in natural scene classification,network directory page recommendation,computer security and so on.The problem caused by large number of features in the bag is one of the difficulties in the multiple-instance learning and there are many irrelevant features.These factors will increase the complexity of the unknown bag's label prediction and affect results.Therefore,it is necessary to preprocess data sets,that is,to increase weights of the relevant features,which is important to reduce the negative influence of the number of features and irrelevant features.The classifier plays a significant role in reducing the error of label prediction.The neighbor classifiers based on minimum or maximum Hausdorff distance are more widely used.The minimum or maximum Hausdorff distance has some flaws in operation,but two distances have complementary characteristics.Therefore,if the classifier that integrates minimum and maximum Hausdorff distance is used in label prediction,it not only improves performance of the classifier,but also reduces the label prediction error.This paper focuses on the feature weighting and mixed Hausdorff distance.So this paper proposes an improved feature selection Simba algorithm and an integrated Citation-KNN classifier based on the mixed Hausdorff distance.The specific work of this paper is as follows:First,aiming at the feature weighting of the data sets,this paper proposes an improved Simba algorithm,which improves the weights of relevant features.In the classical Simba algorithm,the Euclidean distance is usually used for calculation,but this distance is only suitable for calculating the distance between points and points.For the shortcoming of the Euclidean distance,this paper uses the minimum Hausdorff distance in order to extend the application range,reduce the effect of the outlier,reduce the test error and the complexity of the label prediction;Second,for the combination of the minimum and maximum Hausdorff distances,this paper uses the AdaBoost algorithm to linearly combine the Citation-KNN classifiers based on these distances.After this step it can generate an integrated classifier based on the mixed Hausdorff distance.Experimental results show that this algorithm not only combines the minimum and maximum Hausdorff distances,which makes up for their defects,but also improves the performance of the classifier and reduces the error of the label prediction.The purpose of the research is to reduce the complexity and test error of the label prediction.The method is that using the integrated classifier based on the mixed Hausdorff distance to learn the data sets.The data sets have been processed by the improved Simba algorithm.These studies have proved their validity by some related simulation experiments and have a certain value in reducing the complexity and prediction error.
Keywords/Search Tags:Multiple-instance Learning, Hausdorff distance, Feature weighting, Simba algorithm, Citation-KNN classifier, AdaBoost algorithm
PDF Full Text Request
Related items