Font Size: a A A

An Improved Correlation-based Feature Subset Selection Method Using OLS Algorithm

Posted on:2015-05-01Degree:MasterType:Thesis
Country:ChinaCandidate:H Y TangFull Text:PDF
GTID:2348330542452504Subject:Engineering
Abstract/Summary:PDF Full Text Request
In machine learning,it is important to build a robust learning model for high-dimension data.One of the main tasks is dimension reduction,which can be divided into feature selection and feature extraction.Feature selection has been proved to be faster and more suitable for data with some redundant features,increasing the learning accuracy,and improving result comprehensibility.Feature selection algorithm typically fall into two categories:feature ranking and subset selection.Feature ranking ranks the features by a metric and eliminates all features that do not achieve an adequate threshold.Subset selection searches the set of possible features for the optimal subset.A good feature subset is one that contains features highly correlate to the class,yet uncorrelate to each other.The main criterion of feature selection is that good feature sets contain features that are uncorrelated with each other,yet highly correlate to the output.Based on this criterion,we address the problem of feature selection through correlation-based feature clustering and feature ranking.Correlation-based clustering is proposed to group features into some clusters based on correlation between two features.Feature ranking ranks the features by the contribution of the feature to the output.We have proposed a feature ranking method using the sensitivity in SVM to measure the contribution of each feature.But this method does not consider the independence of each contribution,contributions of the features are correlated.Due to this limitation we propose an OLS-based feature ranking method,using the orthogonal quality to consider the contribution of each feature independently.And the SOLS-based feature ranking method to overcome the time consuming problem of OLS-based method when the data set is of very high dimension.We also proposed en ensemble feature ranking method to aggregate different feature ranking methods so as to get a more robust feature subset.The two step feature selection method selects features which rank first in its cluster,we can get a subset that contains features highly correlate to the output,yet uncorrelated with each other.Due to the simulation results,we can see that our method can get a good feature subset that can reduce the input dimention,improve computational efficiency and also improve the learning accuracy with data contains redundent features.
Keywords/Search Tags:Feature Subset Selection, OLS Algorithm, Correlation-based Clustering, Ensemble Feature Ranking
PDF Full Text Request
Related items