Font Size: a A A

Study On Incomplete Data Mining Based On Rough Sets And Granular Computing

Posted on:2013-08-31Degree:MasterType:Thesis
Country:ChinaCandidate:X H ChenFull Text:PDF
GTID:2248330374451961Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Incomplete data exists widely in the real world in various fields. How to analyzecorrectly uncertainty of incomplete data and gain useful knowledge from them for humanbehavior or decision has become a research hotspot in the field of data mining. It has beenverified that rough set theory is a useful mathematical tool to deal with imprecise andincomplete data, while as a new theory of intelligent computing granular computing aims tosolve problem from different levels and viewpoints and provides an efficient mechanism todeal with uncertain data. However research results for incomplete data mining based ongranular computing are relatively scarce. Through combining rough sets and granularcomputing, the problem of incomplete data mining is studied in this thesis, and theoreticalanalysis and experimental verification are accomplished for the problems such as corecomputation and attribute reduction. The main content includes following aspects:1. Core computation based on discernibility matrix. Firstly a new definition fordiscernibility matrix of incomplete decision tables is given in which the attributes in anelement of discernibility matrix are divided into two categories. And then an threshold is setbased on the global classification ability of attributes. Finally core attributes are extracted bytwo steps from the single-attribute elements of discernilibility matrix based on the threshold.An example shows that new method can exclude some attributes with low classificationability from the core attributes. So core obtained by the new method is more reasonable andmore practical that can ensure the reliability of the later mining steps.2. Study on interval granule model. Firstly the idea of granule construction of interval granulemodel is analyzed, combining with rough set theory, upper-approximation, lower-approximationand precision of interval granule model is studied, and relative definitions and properties aregiven. Then several generalized rough set models for incomplete data are analyzed andcompared, their advantages and disadvantages are pointed out, and the superiority of intervalgranule model is summarized. Finally for the incomplete information system the problems ofhierarchical structure and rough approximations on set sequence of interval granule arediscussed, and relative properties are verified by an example. 3. Attribute reduction based on interval granule model. Firstly, by analyzing thedisadvantage of judgement theorem on attribute redundancy based on tolerance relation, anew judgement theorem of attribute redundancy is proposed based on interval granule model.In the new theorem, the deviation caused by the single-direction reduction of models basedon tolerance relation is avoided through a comprehensive judgement based on the changes ofupper-granule and lower-granule of interval granule which can obtain an objective conclusion.Therefore, an method of attribute reduction based on interval granule model is presented foran incomplete decision table.4. Mining experiments on incomplete data. Combined with an open source data miningsoftware-orange, a series of experiments are done on UCI data sets to verify the efficiency ofthe attribute reduction algorithm based on interval granule model. Comparative analysis ontwo sets of test accuracies show that the algorithm based on interval granule model iseffective for the attribute reduction of incomplete decision table and helpful for obtaininghiger test accuracies on mining incomplete data.In conclusion, the problem of mining incomplete data based on rough sets and granularcomputing is studied deeply, and some theoretical foundations and important algorithm areproposed for decision problem of incomplete data. The research results of this thesis can beused as a basis to construct feasible solution plan for many practical problems in the realworld, and have important theory significance and application value.
Keywords/Search Tags:Incomplete data, Granular computing, Rough sets, Attribute reduction, Datamining
PDF Full Text Request
Related items