Font Size: a A A

Research Of Classification Algorithms Based On Decision Tree

Posted on:2007-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:J H HuFull Text:PDF
GTID:2178360182480758Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data mining, is the product of the evolution of information technology, which is a complex process extracting the implicated and valuable patterns, knowledge and rules from a large scale of dataset. In this process, classification of data is the important topic in the research field of data mining. At present, there are many techniques for data classification such as decision tree induction, association rule classification technique, Bayesian classification and Bayesian belief networks, genetic algorithms, neural networks, rough sets, and so on. Decision tree method has been widely researched and applied for its systematic clearness of the theory and its wide availability for researchers. What's more, it is easily transformed into "IF-THEN" rules of classification.This paper mainly introduces the decision tree algorithm for classification. Firstly, the basic knowledge about decision tree and some representative algorithms for inducing decision tree are discussed, including ID3, which is classical;C4.5, which can deal with continuous attributes and some empty attribute, at the same time, it can overcome the ID3's weakness which is apt to select some attribute with more value;CART, which uses GINI coefficient about attribute selection and induces a binary tree;SLIQ and SPRINT, which are scalable and can be easily parallelized, moreover they don't have any limitation of main memory. Secondly, a new algorithm based on ID3 is proposed with advantages and disadvantages of these algorithms. Using mathematics Taylor formula and giving a power value N to information entropy for all attributes, the value N is determine to the number of attribute value. The new algorithm will reduce complication of calculate and enhance the efficiency. Moreover, it can make a more reasonable choice of splitting attribute.At last, ID3 algorithm and the new algorithm based on ID3 are developed on the eclipse platform by Java. Some actual training dataset are adopt to test and the experimental results show that the new algorithm can raise the speed of the process of making decision tree and reduce complication of time, at the same time, it can overcome the ID3's weakness which is apt to select some attribute with more value.Furthermore, the performance of classification on decision tree becomes better and better with the enlarging of dataset scale. Both the theoretical analysis and the experimental comparison show that the algorithm proposed in this thesis has better improved performance than ID3 algorithm and expresses a good result for classification.
Keywords/Search Tags:Decision Tree, ID3 algorithm, Entropy of Simplification, Entropy power of Simplification
PDF Full Text Request
Related items