Font Size: a A A

Cost Sensitive Feature Selection Based On Data Correlation

Posted on:2019-06-14Degree:MasterType:Thesis
Country:ChinaCandidate:S L YuFull Text:PDF
GTID:2428330545485541Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
In the era of big data,feature selection plays an important role in data mining and machine learning.Traditional feature selection algorithm is proved to be efficient by obtaining the high classification accuracy.In real applications,data collection often has different costs,and different misclassification always tend to produce different misclassification costs.In this dissertation,three cost-sensitive feature selection algorithms based on data correlation are proposed.1.We propose a cost-sensitive feature selection algorithm based on neighbor preserving.First,the neighborhood matrix of the sample is obtained by using samples.Then,each feature importance degree is calculated using the cost matrix and the neighborhood matrix.Finally,the corresponding feature selection algorithm is proposed.2.We propose a rough sets and Laplacian score based cost-sensitive feature selection.First,we use rough sets to calculate the core of all the features.Second,the test cost is obtained by using three different distributions.Finally,we calculate the feature importance degree by combining the Laplacian score and the test cost,and propose the feature selection algorithm.3.We propose a cost-sensitive feature selection algorithm via ?2,1-norm.First,we create the loss function by considering the tradeoff between misclassification costs and test costs.Second,we use ?2-norm to deal with the rotational invariance property and robustness to outliers.Then,we add an orthogonal constraint term to guarantee that each feature is independent.Finally,feature selection algorithm is proposed and its convergence is proved.
Keywords/Search Tags:cost-sensitive learning, feature selection, rough sets, neighbor, ?2,1-norm
PDF Full Text Request
Related items