Uncertainty Measure In F-Rough Sets And Performance Tuning For Rough Sets

Posted on:2015-05-01

Degree:Master

Type:Thesis

Country:China

Candidate:M H Pei

Full Text:PDF

GTID:2298330431493435

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Rough set theory is an effective mathematical tool, dealing with imprecise, vague and incomplete information. It has been widely used in classification and feature selection (i.e. attribute reducts) in data mining, machine learning and pattern recognition. Concrete ways to obtain attribute reducts include positive region, discerniibility matrix and function, information entropy and attribute significance etc.The first aspect of this paper is uncertainty measure in F-rough sets. F-rough set model is a new rough set model for a family of information systems and decision systems. This paper defines some measures of concept drift with the idea of upper approximation and lower approximation in F-rough sets. These measures include the measures for concept drift of upper approximation and lower approximation, the coincidence degrees for concept drift of upper approximation and lower approximation etc.. Moreover, this paper investigates some properties of these measures.The second aspect of this paper is performance tuning for rough sets. Partitioning takes a great deal of time in all the kinds of algorithms for reduct in rough set. Meanwhile, comparison dominates the partitioning process. Early reduct algorithms, adopting brute force strategy, need a lot of time to do comparison operations in partitioning. This is unacceptable in reducing a large decision table. Hash can handle the partitioning problem in an elegant and efficient manner by means of reducing comparison operations significantly. This paper improves the performance of partitioning a decision table by adopting the hash methodology, furthermore, it takes a detailed comparison between hash and sort-based method.By means of Hashing-Partitioning algorithm, time-consuming comparison operations reduce significantly, therefore, many rough set algorithms can become more efficient. Experiments show that our method is well suited in acquisition of positive region, core attributes of a decision table, pawlak reducts based on positive region and parallel reducts based on matrix of attribute significance.The other highlight in this paper lies in its ability to handle large-scale data. Many authors claim that their method can handle large-scale data, but they only test the data set with no more than100000instances in fact. This paper not only test several data sets with several million instances but also tests a synthetic data set with3.2GB data, which has about40000000instances.Finally, hashing-partitioning can also upgrade parallel reducts, which can obtain the approximate reduct of a decision table. Test shows that upgraded algorithm is superior to its counterpart algorithm.

Keywords/Search Tags:

F-Rough Sets, Concept Drift, Hashing-partitioning, Parallel reducts, Pawlak reducts

PDF Full Text Request

Related items

1	Condition-induced Concept Drift Detection And Optimizing Selection Of Attribute Reducts
2	Research On Algorithms Of Parallel Reducts In Rough Sets
3	Parallel Reducts And Decision In Various Levels Of Granularity
4	Dynamically Ensemble Rough Set Reducts
5	Research Of Some Intelligent Mining Algorithms Based On Knowledge Roughness And Extended Reducts Of Rough Sets
6	Research On Attribute Reduction And Concept Drift With F-rough Sets
7	Research On Data Stream Classification Based On Granular Computing And F-Rough Sets Extension
8	Combination Of Frequency Reduction And Dynamic Reduction Methods For The Classification Of Inconsistent Decision Tables
9	Information Systems, Rough Set Theory
10	Research On Concept Drift Detection And Rough Set Model Expansion