Font Size: a A A

Applications Of Rough Set Theory In Data Mining

Posted on:2004-06-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:X MaFull Text:PDF
GTID:1118360122471277Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
This dissertation focuses on applications of rough set theory and rough analysis in data mining. Rough set theory is a newly developed mathematical tool for dealing with uncertain knowledge. To solve some existed problems in data mining, the thesis gives out a few resolutions with the new mathematical tool. Information theory and multiple statistics are introduced into rough analysis together with rough set theory and other techniques, new results are giving for knowledge discovering, associative rules mining, pattern classification and data cleaning, etc. After a brief summary on data mining and rough set theory, the research works in the thesis can be descript as follows:1. To the problem that finding rules in enormous data is very time-consumable and the expansibility of existed algorithms is not very good, the thesis proposes a new method to discompose large data table based on the concepts of positive region and the importance of attribute in rough set theory. Existed algorithms of rule deduction can be applied directly on the tree structure obtained by partition and the times for computation will be reduced observably. Validation of information entropy on the partition structure shows that the partition of data table will not lead to the loss of information, while the computing speed increases at the same time, which reflects the practicability and rationality about the partition of large data table.2. To the problem that the data table will be searched many times in mining of associative rules, an algorithm using with equivalence classes concept of rough analysis in the mining of single-dimensional Boolean associative rules is introduced. The algorithm uses multiple minimal support thresholds instead of single minimal support threshold to settle with its limitation in the finding of frequent items, which makes the resultant rules set more proficiency, and including more significant rules. Interestingness is one of important index for evaluation of rules, and a way for evaluation of subjective interestingness is introduced in this thesis to help users discovering more significant rules.3. One attribute selection and reduction method is presented based on that using factor analysis technology to divide conditional attributes into groups to outline that conditional attributes in one attribute group is relevant to corresponding factor, and those factors are linear combination of target concept.Information entropy evaluation is used for attribute selection based on that whether the attributes groups and attributes are strong correlation with corresponding target concept and factor, reserving attributes that are correlative to target concept, and deleting irrelevant attributes.4. Combining the attribute selection method with rough analysis, a classifier modeling method is put forward, which owns the ability of deleting redundant attributes and deducing rules based on rough analysis. Aiming at the multiple matching and no matching problems that happens when predicting class labels of objects with unknown target concept, the thesis defines partial matching function and flexible matching function, and the class labels of objects with unknown target concept can be predicted according to the functional values calculated.5. Data preparation is essential procedure before data mining, and the thesis presents two algorithms for the filling of missing values and the finding of duplicates, two of data problems in data preparation. At first, the thesis uses of rough analysis to predict the missing values with known values to solve the problem of missing values, and the results are with higher accuracy; then the thesis uses partition data table and quick sort method to find duplicates, which can decrease the searching time for finding.Finally, a recapitulative conclusion is given, and the future research directions are proposed.
Keywords/Search Tags:Data mining, rough analysis, rule deduction, associative rule, classification, data preparation
PDF Full Text Request
Related items