Font Size: a A A

Feature Selection Of Information Systems Based On Neighborhood Toleranc Rough Sets

Posted on:2019-06-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y L WangFull Text:PDF
GTID:2348330542471976Subject:Mathematics
Abstract/Summary:PDF Full Text Request
Rough set theory is an important data analysis tool for characterizing the inaccurate,inconsistent and incomplete data.However,the classical rough set theory is defined on the basis of equivalence relations,which can only deal with symbolic data.When the continuous data are discretized.the discrete method may result in the loss of vital information.What's more,different discretization methods also have different effects on the final result of reduction.Due to the error of data measurement,the limitation of science and technology and the error of data comprehension in real life,the large amount of data that is acquired is incomplete and missing,which greatly limits the development of the real data for classical rough set.Thus,how to effectively preprocess the massive amount of data and to extract the potential and useful knowledge has become one of the significant research topics in the big data age.In view of the above problems,a series of rough set extension models are proposed successively,such as the rough set expansion model based on tolerance relation and similarity relation.The models can deal with the incomplete data set,and the neighborhood rough set model can be used directly to process continuous data sets,which avoid the loss of important information that can result from discretization of the data.This paper on the basic of the neighborhood tolerance rough set has done the following research:First,on the basis of the domain tolerance rough set expansion model,a new neighborhood tolerance rough entropy computing method is proposed.According to the entropy function,this paper puts forward the definition of neighborhood tolerance conditional entropy and attribute importance,which lay the foundation for the later feature selection algorithm.Second,this paper gives a method to calculate the size of the threshold value.The disadvantages of fixed threshold and single threshold are analyzed.According to the significance of standard deviation,a new threshold calculation method is proposed.A threshold set makes the classification result more accurate instead of a single threshold.Then,a new feature selection algorithm(SFGFFSNNTC)which based on neighborhood tolerance entropy is given.In the algorithm,we use the neighborhood tolerance relationship to define the relationship matrix,which saves the algorithm running time.In addition,the importance of attribute is defined by using the entropy of neighborhood tolerance,which avoids the defect of inaccurate attribute dependence caused by the missing value.Finally,using the data of UCI database to test the feature selection algorithm in this paper,and the validity of this improved algorithm is proved.
Keywords/Search Tags:Neighborhood Rough Set, Neighborhood Tolerance Relation, Threshold Selection, Feature Selection, Incomplete Mixed Data
PDF Full Text Request
Related items