Research On Feature Selection Algorithms Based On Decision Tree For High-dimensional Data

Posted on:2023-09-29

Degree:Master

Type:Thesis

Country:China

Candidate:M H Jiang

Full Text:PDF

GTID:2568307022498534

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

The advent of the era of big data has accumulated a wealth of data.How to obtain useful information and form knowledge from such a large-scale,high-dimensional data set with redundant features and noise,and transform it into certain economic or social benefits,has become an important research content in the field of data mining and machine learning.To analyze the high-dimensional data,it is necessary to carry out the dimensionality reduction operation of the data first,feature extraction and feature selection are two commonly used methods.Feature extraction uses conversion or mapping to obtain new features,and feature selection filters the most effective features.After studying the existing feature selection algorithms,and proposed an new method of high-dimensional data feature selection based on decision tree,K-Longitudinal-SplitFeature-Selection(KLSFS)method.This method first divides the sample data set longitudinally in the feature dimension to generate K data subsets,and then builds a decision tree model based on the Gini index for each subset of data,according to the importance metric given by the model,the features are selected,and the sub-optimal features subsets are obtained.Finally,all sub-optimal feature subsets are evaluated using classification models,the parameters of KLSFS method are optimized,and the optimal subset of features is obtained.By horizontally comparing the KLSFS algorithm with other feature selection algorithms under the same experimental conditions,the classification accuracy of the classification model on the test set is used as an index to evaluate the performance of each algorithm.Through experimental,using KLSFS method to perform feature subset on highdimensional data can obtain a feature subset with as few features as possible and better classification effect while ensuring that the algorithm has a short running time,which is of great help to the data analysis of the subsequent application of learning algorithms.However,there is still room for improvement in the selection of feature methods and parameter optimization.

Keywords/Search Tags:

High-dimensional data, Dimension reduce, Feature selection, Decision tree, Feature grouping

PDF Full Text Request

Related items

1	High-dimensional Data Processing And Forecasting Based On Feature Learning
2	The Feature Selection Based On Mutual Information And Decision Tree
3	Mdl-based Feature Selection For High Dimensional Data
4	Research On Feature Selection And Feature Representation Algorithm For High-dimensional Data
5	Research On Feature Selection Algorithm Of High-dimensional Data Based On Intelligent Optimization
6	Research On Feature Selection Algorithms Of High-dimensional Samples Based On Data Characteristics
7	Research On Stratified Feature Selection Algorithms For High Dimension Data
8	Feature Selection Based On K-anonymity And Decision Tree Integrated Privacy Protection
9	Research On Population Distribution For High-dimensional Optimization And Its Application In Feature Selection
10	A Study On Unsupervised Feature Selection Algorithms For High Dimensional Data