Font Size: a A A

Efficient Feature Selection Algorithm For Category Data

Posted on:2021-04-27Degree:MasterType:Thesis
Country:ChinaCandidate:J C LiuFull Text:PDF
GTID:2428330620963078Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the advent of the big data era,massive data sets are generated in our daily life,how to acquire knowledge from these data efficiently is a widespread concern of many experts and scholars.Data mining refers to the process of extracting useful information and knowledge implicit in massive,incomplete and fuzzy real data,and feature selection is a widely used data preprocessing technology in data mining.In our real life,the data is not completely static and complete,such as there will be label missing,data dynamic changes and so on.Therefore,how to acquire knowledge efficiently from such data sets is the main research content of this paper.This paper uses rough set theory and information entropy as tools for category data,mainly including the following three aspects.Firstly,for the problem of updating the feature selection results caused by the dynamic change of the dimension in the dynamic data set with missing information,by analyzing the updating mechanism of the entropy of complementary information when the dimension of the data set with missing data is increased,we propose an incremental feature selection algorithm for missing data dimension,and the feasibility and efficiency of the new algorithm are further verified by experiments.Secondly,in view of the problem of how to select feature efficiently in partially labeled data,based on the concepts of rough set and information entropy,this paper presents a rough feature selection algorithm based on information entropy.By analyzing the information entropy of labeled and unlabeled data on a given dataset,we redefine the information entropy of the whole dataset.Then,we define the feature importance based on information entropy in the sense of semi-supervision,and design a semi-supervised rough feature selection algorithm based on information entropy which can deal with partially labeled data effectively.Experimental results show that the new algorithm is efficiency.Thirdly,for the feature selection of partially labeled data sets,by introducing the definition of object coupling similarity algorithm,we redesign the distance measurement in the Relief F algorithm,and a semi-supervised feature selection algorithm based on Relief F algorithm is designed.The experimental results also verify the effectiveness of the algorithm.By analyzing the practical problems existing in the process of feature selection,this paper designs three efficient feature selection algorithms,which can effectively deal with the effective selection of target feature subsets in dynamic data sets and partially labeled data sets,which can provide new ideas for subsequent data mining and knowledge discovery,and provide new research methods and theoretical support for dealing with related problems.
Keywords/Search Tags:Feature selection, Category data, Rough set, Information entropy
PDF Full Text Request
Related items