Font Size: a A A

The Data Mining Algorithm Based On Rough Sets

Posted on:2005-12-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:W J LiuFull Text:PDF
GTID:1118360182979439Subject:Fuzzy mathematics and artificial intelligence
Abstract/Summary:PDF Full Text Request
Rough set theory is a new mathematical approach to uncertain and vague data analysis. The application of rough set theory for machine learning, knowledge acquisition , decision analysis, knowledge discovery, expert system and pattern recognition has been proved to be very successful. The main advantage of rough set theory is that it does not need any preliminary or additional information about data—i.e., like probability in statistics, basic probability assignment in the Dempster-Shafer theory, grade of membership, or the value of possibility in fuzzy set theory.In this thesis, we make a detail study of getting hidden and useful information from databases based on rough set theory. There are four main parts in this thesis, in which including data pretreatment, reduct of condition attributes, decision algorithm and so on.First, we chiefly study the process of Data Mining based on rough set theory. There are four chapters in this part (chapter 3-6). In chapter 3, completion algorithm is considered. Firstly, we introduce the previous work on completion algorithm and analysis their limits. Secondly, we propose new completion algorithms according to the different kinds of condition attributes. In chapter 4, discretization method of continuous attributes is researched. Firstly, we introduce Entropy-Based discretization algorithm and discuss some limits in it. Secondly, the concepts of information gain and distance of a partition are defined. Thirdly, we propose a discretization algorithm based on the distance and information gain. Also, in this chapter, we give a definition of belief degree, study its characters, and bring forward a discretization algorithm based on the belief degree.In chapter 5, we mainly study the algorithm of attribute reduction. Two reduction algorithms are proposed. One is an improved algorithm based on the discernibility matrix and logic operation, the other is based on the generalized information table.In algorithm one, first, we point out the attribute reduction algorithm provided by [23] is not always correct. Second, the reason of this incorrectness is analyzed, and an improved algorithm is proposed based on the results. Then the validity of this new algorithm is verified by an example. At last, we prove that the algorithm can have the same reduct as the algorithm given in [13], but it requires less computational effort.In algorithm two, first, a method of forming the generalized information table of a decision table is given. Second, according to the features of the generalized informationtable, an approach about how to weigh the significance of condition attributes is acquired. At last, we propose an algorithm to get the minimum attribute reduction of consistent and inconsistent decision table respectively.In chapter 6, the minimum decision algorithm is researched. Firstly, we introduce the basic philosophy of the minimum decision algorithm. Secondly, combining the method of attribute reduction, we bring forward several methods of obtaining minimum decision algorithm. In this chapter, we also compare the decision tree with that formed by information gain. From the contrast result we can see the decision tree formed in this method is better than that formed in information gain in some cases.Second, we mainly research the generalization of rough set. And our work is focused on gaining decision rules directly from the decision table with continuous condition attributes or with some miss values.Third, we mainly investigate the classification algorithm of the new objects. Firstly, the concepts of similarity degree of two attribute values and two objects are defined. Secondly, according to the principle of maximum similarity degree we propose a classification algorithm based on similarity degree. Also, according to the principle of maximum membership degree, we bring forward another algorithm-a weighted comprehensive classification method. This method compensate the drawback that it is difficult to give significance of a attribute.Finally, the algebraic properties of rough sets are studied. For example, it is a bounded distributive lattice, double stone algebra and Lukasiewicz algebra.
Keywords/Search Tags:Rough set, Data mining, Completion, Discretization, Belief degree, Attribute reduction, Attribute core, Discernibility matrix, Generalized information table, Decision tree, Generalized decision set, Rough-Fuzzy ideal, Lukasiewicz three value algebra
PDF Full Text Request
Related items