In recent years, there are many high dimensional and massive data sets generated in real lifeand scientific research which results in the traditional classifier facing with the unprecedentedchallenge. However, a large number of studies show that feature selection can improve theperformance of classifier effectively by eliminating irrelevant and redundant features. In addition,feature selection can also be used as knowledge discovery tool to find the natural variables modelby stable feature selection. In consequence, feature selection has become a research hotspot in manyfields, such as statistics, pattern recognition, machine learning, data mining and so on. In this paper,we focus on stable feature selection algorithm based on ensemble learning.When feature selection is used as knowledge discovery, in addition to classificationperformance of the algorithm, the stability is also very important. In order to obtain featureselection algorithm with high performance and high stability, based on the idea of ensemblelearning, this paper proposed three different kinds of ensemble feature selection algorithms. Firstly,the feature selection based on ensemble energy model is introduced which focus on the featureselection algorithm framework based on energy model and feature ranking algorithm L-Lmba basedon this framework. And then we use the proposed feature selection algorithm L-Lmba as basefeature selector and linear combination as ensemble strategy to design simple ensemble featureselection algorithm. The experiment on real data sets shows that the performance of L-Lmba isbetter than classical feature selection algorithms, such as Relief, Lmba and so on. Besides thestability of ensemble feature selection algorithm is better than single feature selection algorithm.Secondly, based on Logistic loss function and combining L2regularization item, we design newensemble feature selection algorithm L2-en-logsf. In the meantime, we analyses the algorithm fromenergy model and discuss the rotation invariant algorithm. The experiment on the real data setsshows that the algorithm can obtain better classification performance and stability than other featureselection methods. Finally, in order to improve both classification accuracy and stability, ensemblefeature selection method EFW based on local learning and diversity is studied. EFW utilizes thealgorithm stability assured by ensemble mechanism and diversity to improve ensemble featureselection algorithm’s classification accuracy. The experiment on many real data sets including smallsimple data siets with high dimension shows that the proposed algorithm can obtain much betterclassification performance and assure much higher stability. |