Font Size: a A A

Feature Selection Algorithm Using SAL Framework

Posted on:2019-03-30Degree:MasterType:Thesis
Country:ChinaCandidate:M L ZhangFull Text:PDF
GTID:2428330548459290Subject:Engineering
Abstract/Summary:PDF Full Text Request
Feature selection as a combinatorial optimization problem is an important preprocessing step in data mining;which improves the performance of the learning algorithm with the help of removing the irrelevant and redundant features.In the real machine learning process,when obtaining the dataset,the researcher process the dataset by feature selection,and then use selected feature subsets to train the learner.Feature selection is an important issue because in the real task,we offen learn high-dimensional feature datasets,which brings us dimension disaster.If we can select valuable features,the follow-up learning stage could build model on a part of features,then we can relief part of dimension disaster problem.Another reason is that removing irrelevant and redundant features can reduce subsequent learning difficulties.Evolutionary algorithm is the most popular choice to solve the feature selection problem(such as forest optimization algorithm,particle swarm optimization algorithm,etc),by discretizing the evolutionary algorithm to find the best optimal feature subsets.Both FSFOA and POS(4-2)are the feature selection algorithm based on evolutionary algorithm.In recent years,some researches show that the feature selection algorithm based on evolutionary algorithm has better generalization performance than the traditional machine learning feature selection algorithm.It has been noticed that various implementations of EAs share a common structure that consists of a cycle of sampling-and learning(SAL)framework.Sampling-and-classification(SAC)is a specific version of SAL.In the learning stage,the binary classifier is used as a model to guide the sampling quality of the sampling stage,and also the computational cost is much better than other evolutionary algorithm.However,there are some limitations to solve the feature selection problem directly using SAC,Firstly,the initialization strategy for feature selection has not been proposed.Second,the selection of evaluation function has limitations and is not suitable for solving the feature selection problem.In this paper,a new initialization strategy is proposed,which uses feature selection as a discrete search problem to improve the performance of SAC.Redefining the evaluation function,the accuracy of the classifier on the prediction dataset is used as a criterion to evaluate feature subsets.Therefore,the final selected feature subset can classify the data set in a good performance.The newly proposed FSSAC algorithm can select the best subset of features in a relatively short period of time,improve the accuracy of classifier classification,and has good generalization performance.The main improvements to FSSAC are currently in the initialization and evaluation functions,since most evolutionary algorithms are initialized at random,which introduces a great deal of uncertainty over later stages of learning.In addition,the choice of evaluation function mainly focuses on the classification accuracy of the classifier,neglecting the factor of dimensionality reduction.When realistic learning tasks require dimensionality reduction,FSSAC clearly has some limitations.Therefore,the selected subset of features reduces the dimensionality though it improves the accuracy.In this paper,the idea of forward selection of sequence and backward selection of sequence improves the initialization phase.In the initial sampling stage,we take the size of the sample as a factor of consideration.Most samples of the sampling set are selected from the smaller subset of features,The remaining samples are selected in the high-dimensional feature subset.Also in the evaluation function,we have added a dimension reduction function that takes advantage of the trade-off between the accuracy of the parameter balance and the dimension reduction.Experiments show that the improved FSSAC not only improves the accuracy,but also reduces the dimension of the feature subset and achieves the expected result.
Keywords/Search Tags:Feature Selection Problem, Sampling-and-Learning(SAL), Initialization Strategy, Update Mechanism
PDF Full Text Request
Related items