Font Size: a A A

Data Mining Based On Decision Tree And The Application In Chemical Pattern Classification

Posted on:2006-01-04Degree:MasterType:Thesis
Country:ChinaCandidate:L LiFull Text:PDF
GTID:2121360152971840Subject:Chemical Engineering
Abstract/Summary:PDF Full Text Request
As the computer technology and modern analytical technology develop, the amount of data in chemistry and chemical engineering has increased fast. The appearance and development of data mining supplies us a powerful tool to gain useful information hidden in all kinds of data. But the effect of these methods has close connection with the characteristic of data in variable fields. The data of chemical classification often has the feature of higher-dimension, noise and compound linear. In this paper, we focus on the discretization, feature selection, rule generation, and chemical pattern modeling. Some data mining methods and the theory of decision tree also have been introduced. The main contributions in this dissertation are as follows:(1) Because the data sets for chemical classification are mostly continuous, the process of discretization is necessary to improve the performance of decision tree classifier. In this paper, we adopt Minimal Description Length Principle (MDLP) to discretize the continuous data after analyzing the characteristic of the data. Compared with other discretization methods, MDLP is more stable and more effective.(2) Generally speaking, most data sets contain redundant attributes. With the presence of these attributes, the workload and complexity of data processing will increase. At the same time, it will impair the efficiency of classifier. In this paper, we use feature selection as pretreatment for data classification. Besides introducing the principle and the methods of feature selection, we choose Las Vegas Filter to select the proper subset from the discretized data set. The good result shows that feature selection can find the subset that has the tight connection with the result of classification. And it also proves that feature selection can improve the exactness of classification to some extent.(3) Introduce the principle and several algorithm of decision tree especially C4.5. A decision tree has been built for a specific instance by using C4.5 algorithm. Good result has been gained. Compared with Artificial Neural Network and statistical methods, decision tree doesn't depend on the distribution of data sets, and makes the classification rules that are explicit and easy to understand.(4) Considering the classification of continuous chemical data sets, we proposed a decision tree method based on pretreatment process including discretization and feature selection. The satisfying result of two examples improves that this method has good predictive capacity and is fit for data mining in chemical pattern classification.
Keywords/Search Tags:data mining, decision tree, discretization, feature selection, chemical pattern classification
PDF Full Text Request
Related items