Font Size: a A A

Study On Rough Set Theory For Data Mining

Posted on:2004-04-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:H YuFull Text:PDF
GTID:1118360122470364Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Rough set is a new valid mathematical theory developed in recent years, which has the ability to deal with imprecise, uncertain, and vague information. It can find valid, and potentially useful knowledge in data. Since then, it has become increasingly popular and has been applied in such fields as machine learning, data mining, and intelligent data analyzing successfully. This thesis studies some key technology questions in data mining based on rough set theory.It is well known that there are usually much redundant data in large knowledge repository. These data not only waste the storage space but also disturb making decision. The key problem of knowledge reduction is to express the same knowledge without superfluous knowledge. In the thesis, the knowledge reduction is studied from the viewpoint of information theory.In the thesis, the algebra express and information express of rough set theory are analyzed and compared with each other systematically. Some laws are discovered as following: â‘  When the number of the conditional attributes is increasing, the changing tendency of the conditional entropy of decision attributes for condition attributes (conditional entropy in short) is non-rigorous monotonically decreasing; â‘¡ Suppose the reduction process starts from the core of a decision table, when an un-removable attribute being added to the reduction, the conditional entropy of decision attributes for the reduction is monotonically decreasing; â‘¢ The conditional entropy of decision table will not change in the reduction process. Then, two new algorithms based on conditional entropy, namely CEBARKCC algorithm and CEBARKNC algorithm, are developed in the thesis. Besides, these algorithms are analyzed and compared with MIBARK algorithm through theory and experiment. The conclusion gained as below: for a decision table, the more the ratio, namely cardinality of the core / cardinality of the reduction, is near to 1, the better the algorithm CEBARKCC or MIBARK will be than the algorithm CEBARKNC in efficiency; otherwise, the more the ratio is near to 0, the better the algorithm CEBARKNC will be than the algorithm CEBARKCC and MIBARK in efficiency. Simulation results show that the algorithms can find the minimal reduction in most cases. Actually, some information of objects is increasing, such as client information, sale information, produce information, and network information. On the other hand, the real time or online processing is required in some field, such as intrusion detection, and E-mail filter etc. Consequently, the incremental knowledge acquisition is studied in the thesis.The thesis redefines the categorization of new instances with respect to the formerly knowledge M, namely the selected minimal set of rules. When a new instance is added to the university U of actual knowledge, the change laws of the attribute reduction and value reduction are found as the three theorems in the thesis. The theorems show some as following: â‘  If a new instance x confirms with M, there is do nothing to the set M; â‘¡ If a new instance x partially contradicts or similarly partially contradicts or similarly confirms with M, it can be decided whether R is still a attribute reduction of U' by comparing the attribute values of R between the new instance x and each instance in U; â‘¢ Suppose R is still a attribute reduction of U', the value reduction of the instance in U is changeless, provided that the instance is not contradict x. Then, a new incremental knowledge acquisition algorithm based on rough set theory, namely IKAA algorithm, is developed in the thesis. Also, the new IKAA algorithm and the classical NIKAA algorithm based on rough set theory are analyzed and compared by theory and experiments. The conclusions are gained as below: in the confirming case, the proposed algorithm IKAA runs much more faster than the NIKAA algorithm in time complexity; If a new instance x partially contradicts or similarly partially contradicts or similarly confirms with M, and R is changeless in U', the IKAA algorithm is faster th...
Keywords/Search Tags:Rough Set Theory, Data Mining, Knowledge Reduction, Incremental Knowledge Acquisition, E-mail Filter
PDF Full Text Request
Related items