Font Size: a A A

Semi-Supervised Feature Selection Algorithms Based On Relief

Posted on:2021-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:B G TangFull Text:PDF
GTID:2428330605974759Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,the high-dimensional data information brings new opportunities and challenges to tasks of data mining and machine learning.In order to extract useful knowledge from massive information for learning tasks,researchers have proposed dimension reduction.Dimension reduction can map high-dimensional data into a low-dimensional data space,avoid the issue of "curse of dimensionality",and alleviate the current situation of large amount of data that is not conducive to learning.Relief is an effective algorithm for supervised filtering feature selection,which estimates the weight of each feature by calculating the distance difference between a sample and its both homogeneous and heterogeneous neighbors.The larger the festure weight is,the stronger ability to distinguish classes the corresponding feature has.At present,Relief has been extended to semi-supervised fields,but the existing semi-supervised Relief algorithms cannot handle multi-classification problems.To remedy it,this thesis focuses on the semi-supervised feature selection algorithm based on Relief,and proposes schemes to solving feature selction of multi-classification problems in the framework of semi-supervised learning.The main work is summarized as follows:A semi-supervised feature selection based on Relief for multi-classification is proposed,called MSLIR.In view of the fact that the existing Relief semi-supervised methods are not suitable for multi-classification problems,this thesis designs a new scheme to calculate the margin vector for unlabeled data,and then proposes MSLIR.By defining a temporary label for an unlabeled sample,MSLIR can calculate a temporary margin vector for this sample,and then get the inner product of the temporary margin vector with the temporary label and the feature weight.We take the temporary margin vector with the largest inner product as the final margin vector of the unlabeled sample.The objective function is optimized by using the labeled and unlabeled margin vectors to obtain the optimal feature weight.Experimental results verify the effectiveness of MSLIR on multi-classification datasets.A semi-supervised Relief feature selection method based on nearest neighbor is proposed,called MSLIR-NN.In view of MSLIR with issues of high complexity and poor label prediction of unlabeled data,this thesis proposes MSLIR-NN.According to the label information of labeled samples,the nearest neighbor classifier is used to predict the labels of the unlabeled data,and then the margin vectors of all samples are calculated.The margin vectors of labeled and unlabeled data are combined to optimize the objective function,which results in the feature weight.Experimental results confirm that MSLIR-NN can enhance the prediction accuracy of unlabeled data.A semi-supervised Relief feature selection algorithm based on local preservation is proposed,called LPLIR.In view of the fact that the existing semi-supervised feature selection methods based on Relief cannot maintain the local structure of original data,this thesis proposes LPLIR by adding the Laplacian regularization.This algorithm can ensure that data in the original feature space has the same local structure as that in the weighted feature space.Experimental results show that LPLIR is superior to the existing semi-supervised feature selection methods.
Keywords/Search Tags:feature selection, Relief, semi-supervised multi-classification, nearest neighbor, local structure preservation
PDF Full Text Request
Related items