Font Size: a A A

Feature Selection Based On Particle Swarm Optimization For High-dimensional Imbalanced Data

Posted on:2022-04-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y H WangFull Text:PDF
GTID:2518306533972959Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
As an important method of data dimension reduction,the purpose of feature selection is to select an optimal feature subset from the original feature set,so as to reduce the learning cost while maximizing given performance measures.In many real applications,such as medical data,biological information and so on,original data obtained by decision makers not only includes a large number of features(high dimension),but also is class-imbalanced and even contains missing values,which brings huge challenge to existing feature selection algorithms.In view of this,considering high-dimensional imbalanced data and high-dimensional imbalanced data with missing values respectively,this thesis studies two feature selection algorithms based on particle swarm optimization(PSO),making full use of the global search capability of PSO.The main work of this thesis includes the following two parts:(1)Aimed at the characteristics of multimodal and class-imbalance in high dimensional imbalanced data,a niching particle swarm feature selection algorithm with over-sampling technology(NPSO-OS)is proposed.Firstly,an oversampling method based on the synthesis of minority boundary samples is given to generate high-quality samples for minority class and balance the samples' distribution among dataset.Following that,a niching particle swarm feature selection algorithm guided by elite solution set is proposed.In the algorithm,an update strategy of elite solutionset based on the ratio of fitness-Jaccard distance is developed to prevent the algorithm from losing optimal solutions.Two new operators,i.e.,the adaptive adjustment mechanism of neighborhood and the stagnant niching search strategy guided by elite solutions,are developed to make the swarm search for multiple optimal solutions at the same time.Finally,NPSO-OS is applied to some real datasets and compared with existing classical feature selection algorithms.Experimental results show that NPSO-OS can obtain multiple highly competitive feature subsets including the global optimal one.(2)Aimed at the characteristics of value-missing and class-imbalance in high dimensional imbalanced data,a particle swarm feature selection method with fuzzy clustering(PSOFS-FC)is proposed.Firstly,an improved F-measure based on filling risk(RF-measure)is defined to evaluate the influence of missing data on the performance of feature selection in the case of class-imbalance.Following that,taking the RF-measure as an objective function,a PSO-based method with fuzzy clustering is proposed.In the algorithm,a swarm initialization strategy guided by fuzzy clustering is presented to improve the search efficiency of PSO;by integrating the correlation between features and class labels,a local pruning operator based on feature importance is developed to improve the local search capability of the swarm.Finally,PSOFS-FC is applied to some real datasets and compared with existing classical feature selection algorithms.Experimental results show that PSOFS-FC can achieve good feature subsets with excellent classification performance and less missing data in a short time,and is an efficient feature selection method to deal with imbalanced data with missing values.The paper has 20 figures,27 tables,and 109 references.
Keywords/Search Tags:feature selection, particle swarm optimization, high-dimensional imbalance, missing value, multimodal
PDF Full Text Request
Related items