| Evolutionary feature selection(short for feature selection based on evolutionary optimization)is an effective method for data dimension reduction.This method can find the optimal feature subset by global search strategies and has become a popular technique to solve feature selection problems.However,with the rapid development of information technology,the scale of data is showing exponential growth,and the data with large sample size and high feature dimension are becoming more and more common,and some data has the characteristics of class imbalance.The existing feature selction methods still have disadvantages such as high computation cost and easy local convergence.In view of this,this thesis proposes two feature selection algorithms based on particle swarm optimization(PSO),which combine the global search capability of evolutionary feature selection algorithm with the fast search capability of filter feature selection algorithm for high-dimensional and imbalanced large-scale data.The work mainly consists of the following two parts:(1)A Multi-surrogate-assisted Dual-stage Ensemble Feature Selection Algorithm(MDEFS)is proposed for high-dimensional and large-scale data sets.The first stage uses the ensemble filter feature selection method to delete irrelevant or weakly-relevant features,and the second stage uses the wrapper ensemble method based on particle swarm optimization to select the optimal feature subset from remaining relevant features.Furthermore,replacing the entire data set by multiple types of representative samples,a multi-surrogate-assisted search mechanism of the swarm is developed to reduce the cost of the algorithm on processing large-scale data.Finally,the proposed algorithm is applied to 10 datasets and compared with 6 feature selection algorithms.Experimental results show that the proposed ensemble algorithm can obtain feature subsets with a high classification accuracy in less computing time,and is a robust and competitive feature selection algorithm.(2)A Surrogate-Assisted Multi-Phsae Ensemble Feature Selection Algorithm With Particle Swarm Optimization(SMEFS-PSO)is proposed for high-dimensional and large-scale data sets with imbalanced classes.The first stage uses the filter ensemble feature selection to remove irrelevant or weakly-relevant features quickly;in the second stage,a particle swarm-based ensemble method with global search ability is used to remove redundant features from the rest of the feature space,and the third stage uses a local search strategy to further modify the feature subset obtained by the second stage.Furthermore,in order to reduce the execution cost of particle feature selection method in the second stage and avoid the classification error caused by imbalanced classes,a representative sample selection strategy based on K-nearest neighbor is proposed,and its representative sample sets are constructed for majority and minority classes.Finally,the proposed ensemble algorithm is applied to 9 imbalanced datasets,and comapred with exsiting feature selection algorithms,experimental results show that the proposed ensemble algorithm can obtain feature subsets with higher classification accuracy in less computing time,and can effectively deal with the class imbalance of the problem.The thesis includes 6 figures,24 tables and 125 references. |