Font Size: a A A

Research On Feature Selection Algorithms Based On Decision Tree For High-dimensional Data

Posted on:2023-09-29Degree:MasterType:Thesis
Country:ChinaCandidate:M H JiangFull Text:PDF
GTID:2568307022498534Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The advent of the era of big data has accumulated a wealth of data.How to obtain useful information and form knowledge from such a large-scale,high-dimensional data set with redundant features and noise,and transform it into certain economic or social benefits,has become an important research content in the field of data mining and machine learning.To analyze the high-dimensional data,it is necessary to carry out the dimensionality reduction operation of the data first,feature extraction and feature selection are two commonly used methods.Feature extraction uses conversion or mapping to obtain new features,and feature selection filters the most effective features.After studying the existing feature selection algorithms,and proposed an new method of high-dimensional data feature selection based on decision tree,K-Longitudinal-SplitFeature-Selection(KLSFS)method.This method first divides the sample data set longitudinally in the feature dimension to generate K data subsets,and then builds a decision tree model based on the Gini index for each subset of data,according to the importance metric given by the model,the features are selected,and the sub-optimal features subsets are obtained.Finally,all sub-optimal feature subsets are evaluated using classification models,the parameters of KLSFS method are optimized,and the optimal subset of features is obtained.By horizontally comparing the KLSFS algorithm with other feature selection algorithms under the same experimental conditions,the classification accuracy of the classification model on the test set is used as an index to evaluate the performance of each algorithm.Through experimental,using KLSFS method to perform feature subset on highdimensional data can obtain a feature subset with as few features as possible and better classification effect while ensuring that the algorithm has a short running time,which is of great help to the data analysis of the subsequent application of learning algorithms.However,there is still room for improvement in the selection of feature methods and parameter optimization.
Keywords/Search Tags:High-dimensional data, Dimension reduce, Feature selection, Decision tree, Feature grouping
PDF Full Text Request
Related items