Font Size: a A A

Uncertainty Measure In F-Rough Sets And Performance Tuning For Rough Sets

Posted on:2015-05-01Degree:MasterType:Thesis
Country:ChinaCandidate:M H PeiFull Text:PDF
GTID:2298330431493435Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Rough set theory is an effective mathematical tool, dealing with imprecise, vague and incomplete information. It has been widely used in classification and feature selection (i.e. attribute reducts) in data mining, machine learning and pattern recognition. Concrete ways to obtain attribute reducts include positive region, discerniibility matrix and function, information entropy and attribute significance etc.The first aspect of this paper is uncertainty measure in F-rough sets. F-rough set model is a new rough set model for a family of information systems and decision systems. This paper defines some measures of concept drift with the idea of upper approximation and lower approximation in F-rough sets. These measures include the measures for concept drift of upper approximation and lower approximation, the coincidence degrees for concept drift of upper approximation and lower approximation etc.. Moreover, this paper investigates some properties of these measures.The second aspect of this paper is performance tuning for rough sets. Partitioning takes a great deal of time in all the kinds of algorithms for reduct in rough set. Meanwhile, comparison dominates the partitioning process. Early reduct algorithms, adopting brute force strategy, need a lot of time to do comparison operations in partitioning. This is unacceptable in reducing a large decision table. Hash can handle the partitioning problem in an elegant and efficient manner by means of reducing comparison operations significantly. This paper improves the performance of partitioning a decision table by adopting the hash methodology, furthermore, it takes a detailed comparison between hash and sort-based method.By means of Hashing-Partitioning algorithm, time-consuming comparison operations reduce significantly, therefore, many rough set algorithms can become more efficient. Experiments show that our method is well suited in acquisition of positive region, core attributes of a decision table, pawlak reducts based on positive region and parallel reducts based on matrix of attribute significance.The other highlight in this paper lies in its ability to handle large-scale data. Many authors claim that their method can handle large-scale data, but they only test the data set with no more than100000instances in fact. This paper not only test several data sets with several million instances but also tests a synthetic data set with3.2GB data, which has about40000000instances.Finally, hashing-partitioning can also upgrade parallel reducts, which can obtain the approximate reduct of a decision table. Test shows that upgraded algorithm is superior to its counterpart algorithm.
Keywords/Search Tags:F-Rough Sets, Concept Drift, Hashing-partitioning, Parallel reducts, Pawlak reducts
PDF Full Text Request
Related items