Font Size: a A A

Study On Feature Selection Based On Neighborhood Rough Set

Posted on:2012-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:P WangFull Text:PDF
GTID:2178330332994756Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the improvement of informationization, there is full of massive amount of data which may contain redundant data and noise in practical application. Eliminating these redundancy and noise can greatly improve the data processing ability. As an excellent data analysis tool to handle imprecise, inconsistent and incomplete data, Rough Set Theory(RST) which presented by Pawlak had been widely applied to many areas successfully. However, RST model employs equivalence relations to partition the universe and generates mutually exclusive equivalence class as elemental concept, so it is just applicable to the discrete feature data. In the space of continuous data, the continuity of the feature value make RST not is used directly. As a extension of RST, Neighborhood Rough Set(NRS) model drew more and more attention by researchers. In this paper, we study on the feature selection algorithm based on NRS in particular, and the main contributions are as follows:(1) After a particular analysis about the disadvantage of setting a single, specified threshold as the size of neighborhood, a new method setting the threshold based on standard deviation is advanced to be the size of neighborhood. In this method, a thresholds vector replace the single, specified threshold, and the value of threshold is the standard deviation of feature data divided by parameter n. By this thresholds vector, we can get a group of neighborhood relation matrix. And then, feature importance and other concept based NRS can be calculated by using these neighborhood relation matrix.(2) A feature selection algorithm based on forward greed selection is proposed. This algorithm starts from a null set, by using feature importance based on NRS as a measurement, successively incorporates the most important feature to reduct until the condition of reduct is reached. In order to calculate the feature importance more quickly, the algorithm uses the neighborhood relation matrix generated by threshold vector of the size of neighborhood and the monotonicity of features set and positive region of decision. The experimental results show that this algorithm can not only reduce runtime effectively, but also decrease the number of features in reduct without classification accuracy losing.(3) After transforming the feature selection into nonlinearity optimized combination, a feature selection algorithm based NRS is proposed by introducing ant colony system(ACS). This algorithm uses the feature importance based on NRS as heuristic information and uses the number of feature reduct as fitness conditions, and then takes advantage of ACS to find the minimum feature reduct. The experimental results show that this algorithm can find multiple minimum feature reduct, and have predominance on the number of features in reduct. Although there are difference among the reduct generated by this algorithm in classification accuracy, we can always get a reduct which has a higher classification accuracy than other feature selection algorithm in this paper.
Keywords/Search Tags:Neighborhood Rough Set, Size of neighborhood, Feature selection, Ant colony system
PDF Full Text Request
Related items