Font Size: a A A

Research On Feature Selection Methods For High-Dimensional Classification

Posted on:2022-04-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:K ChenFull Text:PDF
GTID:1488306314473574Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Classification is an important research topic in data mining and machine learning,which aims at classifying instances in the dataset into different groups correctly based on the information described by its features.However,with the rapid increase of the dimensionality of collected data,many irrelevant and redundant features are inevitable to add to these data.These features not only increase the complexities of constructed models but also degrade the performance of learning algorithms and even lead to the problems of "curse of dimensionality" and "over-fitting".Feature selection is an effective data pre-processing technique,and is capable of removing the irrelevant and redundant features with the target concepts from the original features,reducing the computational complexity of machine learning algorithms and improving the classification accuracy and generalization performance of the constructed models.Therefore,developing effective feature selection method for high-dimensional classification prob-lems has great research and practical value.Particle swarm optimization(PSO)is an intelligent optimization algorithm inspired by the foraging behavior of birds,which has been widely applied to feature selection due to being efficient and easy to implement.However,most of the exist-ing PSO-based feature selection methods face the problems of easily falling into local optima,high computational cost,premature convergence,and low search efficiency,especially on the high-dimensional problems.This thesis focuses on the research on the key issues of PSO in high-dimensional feature selection and proposes three new feature selection methods.The details of the studies are shown as follows:(1)Aiming at the problems of PSO that tends to fall into local optimal,lack of diversity,and imbalance between local and global search during the feature selection process,the original PSO algorithm has been improved in terms of population initialization,parameter adjustment strategy and next-generation particle generation mechanis-m.A feature selection method based on an improved PSO algorithm(HPSO-SCAC)is proposed.This method can effectively enhance the performance of the machine learning algorithm and significantly improve the search efficiency of feature subsets.The experimental results on multiple real high-dimensional classification problems show that the proposed HPSO-SCAC method can obtain a higher quality feature subset and increase PSO's convergence performance during the search process.(2)PSO guides the particles to move in the search space according to the individual optimal position(pbest)and the global optimal position(gbest).This learning strategy is easy to implement but may cause particle oscillation if pbest and gbest are situated on different sides of the current position,resulting in reducing search efficiency and ignoring some feature subsets with high classification accuracy during the evolutionary process.To this end,a feature selection method based on feature relevance and surrogate model(SPSO-CUS)is proposed.The key idea is to use the relevance information of features to generate many promising feature subsets,build a surrogate model to pre-evaluate these.feature subsets,and design a particle selection strategy to select particles with better performance to form a new initial population in the next generation.The experimental results on the high-dimensional classification problems show that the proposed SPSO-CUS method can obtain a feature subset with stronger discrimination ability than other compared feature selection methods.(3)PSO-based feature selection methods face the problems of high computational cost and low search efficiency.Inspired by the idea of knowledge transfer in evolutionary multitasking,a multitasking feature selection method(MTPSO)is proposed for high-dimensional classification.This method transforms a high-dimensional feature selection task into several related low-dimensional feature selection tasks,then finds an optimal feature subset by knowledge transfer between these low-dimensional feature selection tasks.Comparative experiments with different types of feature selection methods on multiple high-dimensional classification problems show that MTPSO can find a feature subset with better classification performance in a shorter time.The studies of this thesis are not only suitable for the feature selection problem in high-dimensional classification,but also achieve better performance than other state-of-the-art feature selection methods.In addition,this thesis also provides a new perspective and a promising way for the research of high-dimensional feature selection methodology.
Keywords/Search Tags:featrue selection, particle swarm optimization, high-dimensional classfication, data mining, dimensionality reduction
PDF Full Text Request
Related items