Font Size: a A A

Research On Rough Set Theory Based Data Mining Algorithm

Posted on:2011-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:W J FengFull Text:PDF
GTID:2178330332960304Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of the network technology and information technology, widespread attention has been paid to data mining. The traditional information processing technology is increasingly unable to properly meet the needs of practical applications. Therefore, it is an urgent need to use information processing technology with more efficiency and ability.Pawlak proposed rough set theory as a new mathematical tool to process vague and uncertain knowledge. There is no need to provide any prior information which is not in the data set. The equivalent relation on the set is used to measure the knowledge uncertainty and by this way the rough set theory has more obvious advantages in data mining. The rough set theory is applied to data mining including data preprocessing, finding core attributes, attribute reduction, rule generation and so on. In this paper, based on rough set theory in data mining process steps, dispersing continuous attributes, finding core attributes of decision table and decision table attribute reduction are mainly analyzed and researched. Major initiatives and innovations include:The algorithm combining rough set theory and the OPTICS algorithm is proposed by discretization of continuous attributes. When rough set theory deals with decision table, data is represented by discrete values. Therefore, with deeply research and analysis of the advantages and disadvantages of several discrete algorithms, the algorithm combining rough set theory and the OPTICS algorithm is proposed by discretization of continuous attributes. The evaluation mechanism is based on dependence of rough set, which maintains condition attributes and decision attribute of indiscernibility relation. The algorithm is also a global discretization algorithm and the information system has the more generalization after discretization. Experiment shows that the algorithm can achieve better results of discretization.Heuristic attribute reduction algorithm based on the importance of attribute is developed. Through the study, the attribute reduction algorithms based on attribute importance and information entropy as the heuristic information are not complete. Comprehensive consideration of these two enlightening information, heuristic attribute reduction algorithm based on the importance of attribute is developed. The algorithm use rough set theory as the basis, the importance of attribute as the main criterion and the information entropy of the secondary standards. Experiments show that the algorithm is more complete, more reasonable method.
Keywords/Search Tags:Rough set, Data mining, Discretization of continuous attributes, Attribute reduction
PDF Full Text Request
Related items