Font Size: a A A

Research On Feature Selection Based On Ensemble Learning

Posted on:2022-09-25Degree:MasterType:Thesis
Country:ChinaCandidate:L WangFull Text:PDF
GTID:2518306482493614Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
As one of the data preprocessing methods in the field of artificial intelligence,feature selection has a very extensive and important application background.The ensemble learning algorithms have attracted the attention of scholars which are used as heuristic methods to solve the feature selection problem.This paper fuses ensemble learning algorithms into the process of feature selection and builds feature selection models.The ensemble algorithms Cat Boost and XGBoost define some scoring standards in the process of building trees.These standards are applied as the criteria for evaluating the quality of the features in the field of feature selection.In the feature selection,the features are pre-scoring in order to guide the subsequent feature subset search.The CABFS is proposed based on the Cat Boost.Firstly,the two indicators which is provided in Cat Boost are used to measure the importance of features from two dimensions.Then search the feature subset based on the proposed search strategy.Pre-scoring features using the metrics provided by Catboost can speed up the overall evaluation of features and is more conducive to the subsequent search of feature subsets.The algorithm has been experimentally verified with5 algorithms which are proposed in recent years on 7 different dimensional datasets.The experiment shows that the CABFS has the greatest accuracy on 5 datasets.The BSXGBFS is proposed based on the XGBoost algorithm.Firstly,the three metrics provided in XGBoost are used to measure the importance of features from three dimensions.Then search the feature subsets combined with the new bidirectional search strategy.Using the three metrics provided by XGBoost for pairwise combination not only speeds up the evaluation and search process of features,but also enriches the diversity of feature subsets.The algorithm have been experimentally verified with 6 algorithms proposed in recent years on 11 different dimensional datasets.The experimental results show that the BSXGBFS not only guarantees high accuracy and dimensional reduction on medium and low dimensional datasets,but also satisfies computability on datasets which have more than one hundred thousand or one million features.
Keywords/Search Tags:Artificial intelligence, Feature selection, Ensemble learning, Gradient boosting, Search strategy
PDF Full Text Request
Related items