Font Size: a A A

Research On Reduct Algorithm In Rough Set Theory

Posted on:2005-11-08Degree:MasterType:Thesis
Country:ChinaCandidate:J LiFull Text:PDF
GTID:2168360125950441Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years, Data Mining has rosed a great attention of the information industry, the main reason is because data ocean has more and more larghe we need new technology to transmute the very large data into useful information or knowledge.Classification is a primary task of Data Mining, the aim of Classification is found a model which can discernibility data objects, in order to we can use this model to forecast the class of a data object which we do not know it's class maker. The main classification methods include Dicision Tree, Beyes classification, Support Vector Machine, Artificial neural Net, Genetic Algorithm, Rough Set and etc.Rough Set theory is founded by a Polish mathematician—Z. Pawlak in 1982, it is a new kinds of mathematical tool to deal with Vagueness and Uncertainty problems. The main advantage of Rough Set theory is that it has no use for any preliminary or additional information about data. The effective reduct algorithm is the foundation to use the rough set theory in data maining and knowledge discovery in database. But it has been proved that the problem about solve all reducts or the minimum reduct is NP-hard problem, so search a fast reduct algorithm is however a main research issue of Rough Set theory.In this paper, we give a summarization of data mining and reduct algorithms in rough set theory, and then we propound a series of reduct algorithms about information system and a series of relative reduct algorithms about decision table, they are all base on the discernibility ability index.The algorithms propounded in this paper are all relate to the division of system, the aim of the division of system is replace the discernibility element in the corresponding region with attribute a(see figure 4.1), thereby, it reduce searching space of the algorithm.In order to replace the discernibility element in the corresponding region with attribute a, we must divide system into any equivalence classes according to attribute a. By the explanation of algorithm's idea in chapter 4, an attribute be supposed to be selected ought to an attribute which has a relatively uniform value distribution and has relatively more attribute value.Discernibility ability index about information system and decision table was respectively propounded, an attribute which has the maximal discernibility ability index is used for the division of system.Certainly, information gain measurement is also be apt to the attribute which has a relatively uniform value distribution and has relatively more attribute value, but information gain can't reflect the idea of the algorithm. By theorem 4.2 and theorem 5.2, we can know that an attribute has biggish discernibility ability when it has a relatively uniform value distribution. Moreover, by theorem 4.3 and theorem 5.3, we can know that the discernibility ability of attribute b should not less than that of attribute when attribute b is the thinning of attribute a. Therefore, discernibility ability index is a good heuristic information of division of system.Discernibility ability index I(a) was defined in chapter 4, some properties about I(a) were presented. And then, we propound a basic reduct algorithms about information system, it is base on the discernibility ability index, the main idea of this algorithm is divide and rule, according to the attribute that has the maximal discernibility ability index, we divide system into some subtable, when we gained the reduct of every subtable, we can use them to solve the reduct of original information system. In this paper, we proved the output of basic algorithm is reduct of information system. After analyze the basic algorithm, we propound a series of algorithms: a reduct algorithm which take into account common term; a reduct algorithm which take into account common term and frequent term; a reduct algorithm which take into account threshold of approximate precision, moreover, algorithm's categoricalness were analyzed. For the sake of predigest the evolvement of Boolean function, the reduct algorithm which tak...
Keywords/Search Tags:Data Mining, Rough Set, Reduct, Discernibility Matrix, Discernibility Ability Index, Partition
PDF Full Text Request
Related items