Font Size: a A A

Problem Knowledge-driven Particle Swarm Optimization Algorithm For Feature Selection

Posted on:2022-10-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:X F SongFull Text:PDF
GTID:1488306533968379Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of emerging technologies and application models such as the Internet of Things and artificial intelligence,the scale of attributes(or features)of data obtained by human is growing at an unprecedented rate.To the best of our knowledge,the existence of redundant and/or irrelevant features in data not only reduces the learning speed of algorithms,but also significantly affect their learning accuracies.The purpose of feature selection is to select part key ones from all the original feature set to form an optimal feature subset,achieving the best performance index while reducing the learning cost.However,when dealing with high-dimensional data,most of existing evolutionary feature selection methods still have problems,such as the “curse of dimensionality” and high computational cost.In view of this,the dissertation studies efficient feature selection algorithms based on particle swarm optimization(PSO)for high-dimensional data by fully considering the domain knowledge of feature selection problem.Firstly,in view of the low search efficiency of existing PSO-based feature selection algorithms,a feature correlation-driven bare-bones particle swarm optimization for feature selection is proposed.A particle swarm initialization method based on the correlation degree of class labels is designed to speed up the search speed of swarm.Two new local search strategies based on relevance-redundancy indicator are proposed,namely,supplementary operator and deletion operator,with the purpose of improving the local exploration ability of swarm.An adaptive mutation strategy is given to help the swarm escape from local convergence.Meanwhile,a particle position update mechanism with fewer control parameters is introduced to improve the operability of the proposed algorithm.Secondly,aimed at the “curse of dimensionality” problem of existing PSO-based feature selection algorithms,a feature importance-driven variable-size cooperative coeolutionary particle swarm optimization for feature selection is proposed.A feature space division mechanism based on feature importance is given,based on which the high-dimensional feature space is divided into several low-dimensional feature subspaces.By adopting a multi-swarm coevolution mechanism to search multiple low-dimensional feature subspaces at the same time,an initialization strategy of subswarm size based on the importance of feature subspace is proposed to reasonably allocate computing resources.An adaptive adjustment mechanism of subswarm size that combines the evolution speed and diversity of the swarm is designed to dynamically adjust the size of each subswarm in a timely manner.Meanwhile,a cluster-based particle deletion strategy and a feature importance-based particle generation strategy are proposed to ensure the quality of subswarm.Thirdly,considering both the “curse of dimensionality” and high computational cost of existing PSO-based feature selection algorithms,a feature clustering-guided fast hybrid particle swarm optimization for feature selection is proposed.The algorithm divides the feature selection process into three phases with complementary functions.In the first phase,an adaptive filter feature selection method with low computational cost is given to delete irrelevant or weakly related features.In the second phase,a correlation-guided fast feature clustering strategy is proposed to divide similar or redundant features into a feature class,for reducing the search space of the subsequent PSO method.In the third phase,an improved integer PSO algorithm is designed to select the most representative features from each feature class at the same time to form the final feature subset.Meanwhile,a disturbance-based differential operator suitable for integer coding is introduced to prevent the particle swarm from falling into the local optimum.Fourthly,considering the large-scale high-dimensional data that existing PSO-based feature selection algorithms are difficult to handle,a surrogate sample-assisted particle swarm optimization for feature selection is proposed.A non-repetitive uniform sampling strategy is introduced to divide the original large-scale sample set into several small-scale sample subsets.Regarding each sample subset as a surrogate unit,an improved feature clustering strategy with multi-surrogate unit collaboration is proposed to reduce the computational cost of the clustering strategy proposed in Section 5,and reduce the search space of the subsequent PSO method.Following that,an integer PSO algorithm with integrated surrogate-assisted evaluation is presented,which can generate the final feature subset at a low cost of individual evaluation.Meanwhile,a construction and management strategy of integrated surrogate is designed to ensure the evaluation accuracy of the integrated surrogate on particles.Finally,the proposed theories and methods are applied to a number of typical real datasets provided by UCI and other institutions,and compared with existing representative feature selection algorithms.Experimental results have verified the feasibility and effectiveness of the above four algorithms.The above research results have enriched existing feature selection theories and methods,improved the performance of evolutionary feature selection methods in processing high-dimensional data,and provided a series of effective and reliable data preprocessing techniques for related machine learning algorithms.
Keywords/Search Tags:particle swarm optimization, evolutionary feature selection, cooperative coevolutionary, feature clustering, surrogate-assisted
PDF Full Text Request
Related items