Font Size: a A A

Continuous Feature Selection And Its Application In Decision Information System

Posted on:2020-08-31Degree:MasterType:Thesis
Country:ChinaCandidate:S W YangFull Text:PDF
GTID:2518306131456344Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Data preprocessing is an important part of machine learning before building a model.Filling in data and selecting important features of incomplete data are important processes of data preprocessing.The feature selection data sources have some problems,such as missing values.And feature selection algorithms cannot determine the optimal feature subset or running time is slow.In view of these shortcomings of existing feature selection methods,this paper designs and implements the data processing flow from incomplete decision information system to complete information subsystem.The main work includes the following parts.Firstly,the mean value is used to pre-fill the incomplete data,and the missing position is marked.Then the complete data is clustered for filling the missing position.Finally,the recursive filling is repeated until the data is stable or iterations exceeds the threshold.In the experimental stage,the squared errors of the original data and the filling data under different missing value filling methods are compared.The data are generated by random deletion.The experimental results show that after the missing data filled by clustering and recursive filling method,the complete data set is better filled and the difference from the original data is smaller.Random forest model ranks for complete data and local traversal eliminates useless features.Then the second selection is filtered by the forward search strategy in terms of distance.Finally the feature subset is obtained.The algorithm uses local traversal to improve the execution efficiency,and solves the problem that the traditional method cannot determine the optimal number of features through the forward selection algorithm.The experimental results show that the proposed method can select feature subsets more effectively and improve the classification accuracy of the model.This paper implements the process of complementing incomplete data and selecting important features.The process is applied to lithology identification data and has achieved good application results.
Keywords/Search Tags:Data Preprocessing, Clustering, Recursive Fill, Random Forest, Second Selection
PDF Full Text Request
Related items