Semi-Supervised Feature Selection Algorithms Based On Relief

Posted on:2021-04-17

Degree:Master

Type:Thesis

Country:China

Candidate:B G Tang

Full Text:PDF

GTID:2428330605974759

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of information technology,the high-dimensional data information brings new opportunities and challenges to tasks of data mining and machine learning.In order to extract useful knowledge from massive information for learning tasks,researchers have proposed dimension reduction.Dimension reduction can map high-dimensional data into a low-dimensional data space,avoid the issue of "curse of dimensionality",and alleviate the current situation of large amount of data that is not conducive to learning.Relief is an effective algorithm for supervised filtering feature selection,which estimates the weight of each feature by calculating the distance difference between a sample and its both homogeneous and heterogeneous neighbors.The larger the festure weight is,the stronger ability to distinguish classes the corresponding feature has.At present,Relief has been extended to semi-supervised fields,but the existing semi-supervised Relief algorithms cannot handle multi-classification problems.To remedy it,this thesis focuses on the semi-supervised feature selection algorithm based on Relief,and proposes schemes to solving feature selction of multi-classification problems in the framework of semi-supervised learning.The main work is summarized as follows:A semi-supervised feature selection based on Relief for multi-classification is proposed,called MSLIR.In view of the fact that the existing Relief semi-supervised methods are not suitable for multi-classification problems,this thesis designs a new scheme to calculate the margin vector for unlabeled data,and then proposes MSLIR.By defining a temporary label for an unlabeled sample,MSLIR can calculate a temporary margin vector for this sample,and then get the inner product of the temporary margin vector with the temporary label and the feature weight.We take the temporary margin vector with the largest inner product as the final margin vector of the unlabeled sample.The objective function is optimized by using the labeled and unlabeled margin vectors to obtain the optimal feature weight.Experimental results verify the effectiveness of MSLIR on multi-classification datasets.A semi-supervised Relief feature selection method based on nearest neighbor is proposed,called MSLIR-NN.In view of MSLIR with issues of high complexity and poor label prediction of unlabeled data,this thesis proposes MSLIR-NN.According to the label information of labeled samples,the nearest neighbor classifier is used to predict the labels of the unlabeled data,and then the margin vectors of all samples are calculated.The margin vectors of labeled and unlabeled data are combined to optimize the objective function,which results in the feature weight.Experimental results confirm that MSLIR-NN can enhance the prediction accuracy of unlabeled data.A semi-supervised Relief feature selection algorithm based on local preservation is proposed,called LPLIR.In view of the fact that the existing semi-supervised feature selection methods based on Relief cannot maintain the local structure of original data,this thesis proposes LPLIR by adding the Laplacian regularization.This algorithm can ensure that data in the original feature space has the same local structure as that in the weighted feature space.Experimental results show that LPLIR is superior to the existing semi-supervised feature selection methods.

Keywords/Search Tags:

feature selection, Relief, semi-supervised multi-classification, nearest neighbor, local structure preservation

PDF Full Text Request

Related items

1	Optimization Of Nearest Neighbor Preserving Feature Selection In Multi-label Classification
2	Research Of Nearest Neighbor Classification Algorithm Based On Sample Selection
3	Study On Generalized Nearest Neighbor Pattern Classification
4	The Research On Semi-supervised Classification Algorithm Based On Two Different Composition Method
5	Research On Semi-supervised Multi-label Classification Algorithm Based On Degree Of Association
6	Research On Target Recognition Technology Based On Semi-supervised Learning
7	Research On Feature Selection And Semi-Supervised Classification
8	Research On Ensemble Classification Based On Nearest Neighbor Multiple Classifier Selection And Application On License Plate Recognition
9	Research On The Multi-label Feature Selection And Classification Methods With The Label Correlations
10	Multi Local Means-Based Nearest Neighbor Pattern Classification