Font Size: a A A

Study On Classification In Data Mining Based On Cloud Model And Rough Set

Posted on:2008-08-25Degree:MasterType:Thesis
Country:ChinaCandidate:J SongFull Text:PDF
GTID:2178360215458602Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
As an important research field of data mining, classification has wide applications. Decision tree is one of the models that are often used in classification. It has been widely investigated and applied since it was introduced. However, decision tree can not handle missing data and continuous data effectively and there exist the complexity and uncertainty in knowledge expression. Investigation of decision tree is still one of hot topics in data mining.In order to represent the uncertainty of the concept, cloud model that combines the properties of fuzziness and randomness was introduced. It realized the uncertainty transition between qualitative concept and quantitative description. In addition, in the incomplete information system, an extension of conventional rough sets, the characteristic relation-based rough sets, was proposed that can deal with incomplete data directly.This thesis focuses on some key problems of classification in data mining based on cloud model and the characteristic relation-based rough sets. The main contributions of this thesis are as follows.1. An approach for incrementally updating approximations of a concept and rule extraction under the characteristic relation-based rough sets is presented. A series of experiments shows that the proposed approach may handle a dynamic attribute generalization and perform rule extraction effectively in data mining.2. Cloud model is discussed, including several related concepts in cloud model, theoretical foundation of cloud model and cloud transform for discretizing continuous data.3. A new algorithm DTCCRS based on cloud model and the characteristic relation-based rough sets for construction of the decision tree is presented. It firstly utilizes cloud transform to discretize continuous data. Then, the attribute whose weighted mean roughness under the characteristic relation-based rough sets is the smallest will be selected as the splitting node. Experiments show that the algorithm can handle incomplete data and discretize continuous data effectively. The decision trees constructed by DTCCRS tend to have simpler structure, higher classification accuracy and more understandable rules than C5.0.
Keywords/Search Tags:Data mining, decision tree, classification, rough set, cloud transform
PDF Full Text Request
Related items