Font Size: a A A

Research On Feature Selection Algorithm Of High-dimensional Data Based On Intelligent Optimization

Posted on:2024-09-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:L Q SunFull Text:PDF
GTID:1528307340474024Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
With the continuous development of emerging fields such as the Internet of Things and Artificial Intelligence,the amount of data available to people is growing at an unprecedented speed.However,due to the presence of redundant and irrelevant features in the data,these large datasets not only slow down data analysis algorithms but also have a negative impact on the accuracy of data predictions.Therefore,measures need to be taken to ensure the quality and accuracy of the data.Feature selection refers to the process of selecting a subset of the most representative features from a dataset,forming an optimal feature subset,in order to achieve optimal performance of the research model while reducing the cost of learning.Additionally,feature selection for high-dimensional data faces the challenge of the “curse of dimensionality”,which poses difficulties for dimensionality reduction in high-dimensional data.In this study,based on the complex relationships between features,we propose three different architectures for particle swarm feature selection algorithms for high-dimensional data.The specific research work is as follows:(1)Aiming at the problem of low search efficiency of traditional particle swarm feature selection algorithm,a particle swarm feature selection algorithm based on maximum separation and minimum redundancy is proposed.According to the separation performance,grey correlation and redundancy of features,the maximum separation minimum redundancy criterion and the pseudo-knowledge-driven correlation coefficient of features are proposed.Combined with the idea of Markov blanket,an approximate Markov blanket is defined,the original feature space is reduced to obtain a suboptimal feature subspace.It reduces the search space of the particle swarm and the search ability of particles is enhanced and the convergence speed of the algorithm is accelerated.Experiments are carried out on 12 datasets from different sources,and 9 different feature selection algorithms based on feature correlation are compared.The experimental results show that the algorithm can obtain more discriminative feature subsets.(2)Aiming at the problems of slow convergence,lack of diversity and easy to fall into local optimum in the process of feature selection for high-dimensional data by particle swarm optimization algorithm,a two-stage concise particle swarm feature selection algorithm based on feature separation guidance is proposed.In this algorithm,the first stage is the screening stage.Based on the measure of feature separability,the separation probability of features is defined,and an initial population strategy is constructed accordingly to perform preliminary screening of the original feature set.The second stage is the evolutionary search stage.First,a concise particle swarm optimization algorithm is proposed,which incorporates the influence of particle velocity on particles during the update iteration process.This simplifies the iteration formula from the classical second-order difference equation to a first-order difference equation,thereby simplifying the iterative update formula for controlling particles and reducing the number of parameters in the algorithm.Furthermore,this paper provides a theoretical analysis of the convergence of the concise particle swarm optimization algorithm from three aspects: system processes,convex optimization theory,and Markov chains.It is proven that the concise particle swarm optimization algorithm converges to the global optimum with a certain probability.Next,to enhance the self-learning ability of particles in the population and improve their guidance performance during the iteration process,an elite strategy is proposed.Additionally,the locally and globally optimal update mechanisms designed in the paper not only consider the fitness function of feature subsets but also take into account the size of the feature subsets.Finally,to prevent the algorithm from getting trapped in local optima,an escape strategy for the optimal particle is designed by incorporating a chaotic mechanism,reducing the probability of particles getting stuck in local optima.Experiments are conducted on 16 datasets,and the proposed algorithm is compared with five particle swarm-based feature selection algorithms.The comparative results demonstrate the effectiveness of the proposed algorithm.(3)Aiming at the problem of feature selection in ultra-high dimensional data(gene data),a three-stage concise particle swarm feature selection algorithm based on feature clustering guidance is proposed.The proposed algorithm combines the advantages of Filter,feature clustering and Wrapper,and establishes a three-stage feature selection framework――filtering-clustering-searching.The algorithm uses the idea of “divide and conquer” to filter and cluster similar features,which significantly reduces the space for subsequent feature search.In the process of feature clustering,the fuzzy clustering method is adopted and the maximum likelihood estimation is used to estimate the intrinsic dimension of the gene data set.The estimated value provides a reference standard for the number of feature clusters,which can reduce the influence of human factors on the clustering results.At the same time,the algorithm defines the group probability(G-probability)used to evaluate the feature subset by using the symmetric uncertainty of the feature,and designs the population initialization operator according to this,which further improves the search performance of the algorithm.It is applied on 11 ultra-high-dimensional gene datasets and compared with6 different PSO-based gene selection algorithms.The experimental results show that the C-ID-HCMPSO algorithm can search the gene subsets with stronger classification ability.
Keywords/Search Tags:High dimensional data, Feature selection, Particle swarm optimization algorithm, Pseudo knowledge driven correlation coefficient, Maximum separation minimum redundancy, Feature clustering
PDF Full Text Request
Related items