Font Size: a A A

Research And Application On Rough Set Based Data Ming Algorithm

Posted on:2009-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:X F HongFull Text:PDF
GTID:2178360242989246Subject:Systems analysis and integration
Abstract/Summary:PDF Full Text Request
Data mining is a technique that aims to analyze and understand large source data and reveals knowledge hidden in the data and is an active research field of AI researching. As an efficient mathematic tool to deal with the vagueness and uncertainty, rough set theory provides a new approach of data mining.In this thesis, Data mining based on rough set is studied in the theory and application with respect to the problem that the traditional data mining method can not handle noisy data effectively. Main topics of this dissertation are as follows:(1) In this paper, we re-interpreted the basic concept of classical rough set theory based on VPRS, we also analyse the features and process of the application of rough set theory in data mining and pointed out the direction of the research.(2) To finding out minimal reduct, an attribute reduct algorithm based on entropy is proposed in this dissertation. This algorithm employs core of attribute as the start of reduction and the filted Matrix as the selection criteria of candidate attributes and employs entroy of attribution as heuristic information. Experiment shows that this algorithm, compared to other algorithm, can reduce the risk of of useful attribute's loss and accelerate the pace of attribute reduction.(3) A new method of decision tree based on VPRS is proposed in this thesis.In order to settle the problem that traditional methods can not classify noisy data, the algorithm chooses the boundary region of rough sets as the criteria of selecting partitional attributees.In addition, the conception of Confidence of leaf nodes is redefined, which makes the method more understandable. Experiment shows that, Decision Tree built in this way is more effective and comprehensible.(4) After study and analyses on construction data of stations and security risks of station's construction, this paper employs techniques of data mining to predict the security risks. The first step is data cleaning, integration and transformation and approaches on similarity computing of construction data are introduced. Second, 13 attribution are obtained from 31 using the improved algorithm of attribution reduct.At last, we employ the improved method of decision tree based on VPRS theory to classify 1021 data of construction data and establish a model of risk assessment.Then a data-mining software based on rough set is developed.In addition, Experiment showed that improved algorithms of this paper are feasible and effective in dealing with noisy data.
Keywords/Search Tags:Data mining, Rough set, Attribute reduct, Decision tree, Risk assessment
PDF Full Text Request
Related items