Knowledge discovery in databases (KDD) is a flourish research field relevant to statistics, artificial intelligence and database system. Data Mining is the process of mining the interesting, potentially useful, valid and understandable knowledge in data. Classification is an important sub-branch of Data Mining, which can find out a model describing a predetermined set of data classes or concepts as used to predict the class label for a test sample.Rough Set theory was proposed by polish mathematician pawlak, which used to represent the uncertain knowledge. Rough Set theory has become a main method for KDD due to its unique advantage in knowledge discovery. Entropy is a concept of information theory, which is abroadly used in data analysis field.In this thesis, a RSE algorithm model based on Rough Set theory and Entropy theory is presented, which contains two components-classification model and prediction model. Classification model is based on typical Rough Set theory and Entropy theory, select the attribute according to entropy theory, determine the equivalence classes according to indiscernible relation, then extract the classification rules. Prediction model is based on the extended rough set model --tolerance rough set theory, predict the class label for a test sample according to the definition of the tolerance relation between a sample and a rule.In addition, we designed a prototype system named R-DM, which based on RSE algorithm model and ID3 algorithm model, which completed the classification and prediction model of the RSE algorithm and ID3 algorithm. On this uniform flatform, we compared the RSE algorithm and ID3 algorithm by using the standard UCI data sets. From the experiment, we can see the RSE algorithm is superior to ID3 algorithm indeed.
|