Continuous Feature Selection And Its Application In Decision Information System

Posted on:2020-08-31

Degree:Master

Type:Thesis

Country:China

Candidate:S W Yang

Full Text:PDF

GTID:2518306131456344

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Data preprocessing is an important part of machine learning before building a model.Filling in data and selecting important features of incomplete data are important processes of data preprocessing.The feature selection data sources have some problems,such as missing values.And feature selection algorithms cannot determine the optimal feature subset or running time is slow.In view of these shortcomings of existing feature selection methods,this paper designs and implements the data processing flow from incomplete decision information system to complete information subsystem.The main work includes the following parts.Firstly,the mean value is used to pre-fill the incomplete data,and the missing position is marked.Then the complete data is clustered for filling the missing position.Finally,the recursive filling is repeated until the data is stable or iterations exceeds the threshold.In the experimental stage,the squared errors of the original data and the filling data under different missing value filling methods are compared.The data are generated by random deletion.The experimental results show that after the missing data filled by clustering and recursive filling method,the complete data set is better filled and the difference from the original data is smaller.Random forest model ranks for complete data and local traversal eliminates useless features.Then the second selection is filtered by the forward search strategy in terms of distance.Finally the feature subset is obtained.The algorithm uses local traversal to improve the execution efficiency,and solves the problem that the traditional method cannot determine the optimal number of features through the forward selection algorithm.The experimental results show that the proposed method can select feature subsets more effectively and improve the classification accuracy of the model.This paper implements the process of complementing incomplete data and selecting important features.The process is applied to lithology identification data and has achieved good application results.

Keywords/Search Tags:

Data Preprocessing, Clustering, Recursive Fill, Random Forest, Second Selection

PDF Full Text Request

Related items

1	Research On Feature Selection Method Based On Random Forest
2	Research On Random Forest Algorithm Based On Feature Selection And Diversity
3	Research On Feature Selection And Classification Method Based On Random Forest For Medical Datasets
4	Research And Application Of User Clustering Method Based On Mixed Type Data Analysis
5	Random Forest Feature Selection
6	The Research On Random Forest Based On IV Feature Selection
7	Optimization Of Distributed Random Forest Algorithm Based On Hierarchical Subspace
8	Research On Adaptive Feature Selection And Parameter Optimization Algorithm For Random Forest
9	Research On Imbalanced Data Classification Method Based On Random Forest Algorithm
10	Research On Optimization Of Random Forest Algorithm And Its Application In Text Parallel Classification