Font Size: a A A

The Research On An Improved Algorithm For Incremental Induction Of Decision Tree

Posted on:2008-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y S LiuFull Text:PDF
GTID:2178360272968280Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In online classification system, such as customer behavior analysis,Web log analysis and intrusion detection, it is an important problem to make the classifier to adapt for new samples, and insure it can categorize right and keep on working. To solve data increment problem, there already have a few incremental decision tree induction algorithms. But the storage of these algorithms is expensive for preserving large amount of history sample data generally. And to assure the structure is consistent with the traditional decision tree, they need to carry out testing of structure on decision tree and adjust whenever gaining a new sample, this adjustment needs certain calculation price. So they can't satisfy online classification system's need. Data increment problem is comparatively one simple kind of increment problem, there still exist class increment and attribute increment problems which is especially complicated in real-world. The traditional incremental decision tree algorithms have attached importance to the research to data increment problem, but have ignored the research to class increment and attribute increment problems.In order to solve the three kind increment problems, an improved hybrid classifier algorithm is put forward based on research of decision tree induction algorithm and Bayesian method. The new algorithm combines the merit of decision tree induction method and naive Bayesian method. It retains the good interpretability of decision tree and has good incremental learning ability. When increment problem happens, the algorithm apply the model already learned to new increased sample, it carries out incremental learning on basis of count information in history and the sample to gain knowledge contained in the sample. So it ensures that the classifier is Real-time and effective.To evaluate the performance of the new hybrid classifier algorithm, the contrast experiment between the new algorithm and the existed decision tree induction algorithm is presented. The experiment data comes from the UCI standard database. The experiment results show that the new algorithm can solve the increment problems in data mining easily and good. Be compared with the performance of reconstruct decision tree with traditional method, the new algorithm spends fewer time and can classify samples more accurately. So it's more suitable for online classification system.
Keywords/Search Tags:Data Mining, Decision Tree, Naive Bayes, Incremental Learning, Estimated Probability
PDF Full Text Request
Related items