Font Size: a A A

Based On Neighborhood Rough Set Attribute Reduction Algorithm Research

Posted on:2012-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:N LiFull Text:PDF
GTID:2208330335471184Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of computer technology, networking and storage technology, it is getting easier for one to access and store much more data. As a result the huge dataset emerged. However, there are much redundant, uncertain and incomplete information in the data. This fact seriously affects one to get important knowledge from the data. Meanwhile, the increasing or decreasing or changing of the data will have an impact on the core data. So how to extract the important knowledge from these redundant, uncertain, incomplete and changing data to help one make a decision or a prediction has become an important research field of data mining.Rough set theory (RST) proposed by Polish mathematician Z. Pawlak in 1982 is based on set theory, and has been a powerful mathematical tool to deal with the uncertain and fuzzy data. RST can perform data reduction without changing their classification properties. However, the classical RST can only be used to process the symbolic data, not continuous ones. Neighborhood rough set model was proposed under this condition. This model can process continuous and mixed dataset effectively, whilst avoiding the losing of important or hidden information caused by the discretization for the continuous data. Recently, many experts have focused on the attribute reduction study based on the neighborhood rough set.Neighborhood rough set theory can deal with heterogeneous attribute reduction effectively. This thesis focused on the study of attribute reduction based on the neighborhood rough set theory. The works we did in this thesis are here. At last, experiment results have shown the feasibility and effectiveness of the proposed method.First, we computed the significance of each attribute according to the positive region based on neighborhood theory, and ranked attributes in descending order, then add each attribute one by one to the selected attribute subset until all attributes be selected and tested. The classification accuracy of each attribute subset is evaluated via a support vector classifier. Finally the attribute subset with the best classification accuracy is determined. The proposed algorithm is compared with the forward attribute reduction that based on neighborhood rough set model. Experimental results show that our algorithm leaded to a better classification performance with much fewer attributes selected.Second, an attribute reduction algorithm is presented based on neighborhood rough set theory for the datasets, which are updated by the increment in their samples. It is well known that the increment in samples can cause the changeable in the reduction of attributes of the dataset. We did a thorough-paced analysis to the variety on positive region caused by the new added sample to the dataset, and discussed the selective updating to the attribute reduction according to different cases. The selective updating to the original reduction of attributes of the dataset can avoid the unnecessary operations, and reduce the complexity of the attribute reduction algorithm. Finally, we gave a real example and demonstrated our algorithm.Third, a new forward attribute reduction algorithm is proposed to reduce the heavy computational load of available algorithms to attribute subset selection for incomplete decision system. This algorithm is based on the principle that the discernibility of an incomplete decision system can be preserved under the condition that the positive region of it unchangeable. Our algorithm ranked attributes in descending order according to the influence that they had on positive region, and selected the top one to add it to attribute reduction subset, where the attribute reduction subset is empty at first. The algorithm was generalized to heterogeneous data sets based on neighborhood rough sets to accomplish the attribute reduction process for incomplete decision system with quantity or heterogeneous attributes. Experimental results on UCI data sets and one example analysis all show that the forward attribute reduction algorithm based on neighborhood rough sets can get attribute subset efficiently for incomplete decision system with quantity or heterogeneous attributes. The potential disadvantage of it is that it may not complete attribute reduction process when the most important attribute will not be found at first iteration. At last, experiment results have shown the feasibility and effectiveness of the proposed method.
Keywords/Search Tags:neighborhood rough set, attribute reduction, support vector machines(SVM), incremental updating, incomplete decision system
PDF Full Text Request
Related items