Font Size: a A A

Classification Algorithm Analysis And Miner 'on Web-based Design And Realization

Posted on:2010-01-30Degree:MasterType:Thesis
Country:ChinaCandidate:C L ZhangFull Text:PDF
GTID:2208360275983549Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent decades, with the fast development of computer hardware and software, especially the great advance in internet techniques, the data which people have accumulated is now increasing very fast. Thus, data mining has become the research hotspot. Classification, as one method of data mining, has acquired more and more concerns because of its extensive usage.This thesis focus on several problems in classification algorithm. It composed of three parts: The method of classification model applications, model visualization and the improved ID3 Algorithm.1) The method of classification model applications: The task of classification is to predict the class of new data by using the summarized rules from training data. However, in this process, there exist some difficulties, for example, the rules may be intricate, or it is hard to make use of the big rules, although, these rules are easy to understand. For above problems, this thesis presents a classification model application system, which is designed and implemented on the MinerOnWeb data mining system. Under this system, people can use the rules easily even they can't comprehend the rules. The model application system contains four processes: model extraction, model storage, model comparison, model application.2) Model visualization: The visualization technology can help people understand model structure and the results of model application easily. It is used in the model application stage in this paper. Furthermore, the model visualization and model application results visualization are designed and implemented in the MinerOnWeb data mining system.3) The improved ID3 algorithm: ID3 algorithm is a decision tree algorithm, which is important in the field of machine learning. The concept of information gain is proposed by Quinlan in ID3 algorithm. Information gain is the selection criteria of the best splitting attribute for inducing decision tree. Nevertheless, this algorithm has some drawbacks, one of which is that it tends to choose multi-value attribute as the best splitting attribute. However, the multi-value attribute is not necessarily important for classification in the real world. This paper presents a revised information gain of ID3 algorithm to solve this problem, and the new algorithm has been implemented in the MinerOnWeb system. From the theoretical analysis and experimental results, it is easy to see that the new method has a good effect on multi-value orientation of ID3 algorithm.
Keywords/Search Tags:Data Mining, Classification model application, ID3, Visualization
PDF Full Text Request
Related items