Font Size: a A A

Analysis And Development Of Id3 Decisiontree Algorithm

Posted on:2011-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:R ZhangFull Text:PDF
GTID:2198330332467044Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Data mining focuses on study and practice, it makes equal attentions both on theories and real applications about all of its fields. Data classification is an important part of the data mining, and decision tree algorithm is a main method of data classification. Although there were many other new improvements and even new division methods proposed, but these methods have their specific conditions and limits. Actually, they did not breakthrough the basic theoretical structure of the ID3 decisiontree algorithm. Based on this background, how to analyze and improve the ID3 decisiontree algorithm better is still a meaningful work which should be carefully studied and researched.In this paper, based on actual production data of State Administration of Foreign Exchange in Gansu Province, the author dissected the ID3 decisiontree algorithm itself at the very start, and then he analyzed the advantages and disadvantages of this algorithm. After that, the author introduced a concept named attribute structure similarity of sample data. Paper makes the attribute structure similarity of sample data model and information gain worked together as the choosing standard of non-leaf nodes in selection process about decisiontree.Compared to the original ID3 decisiontree algorithm, the improved ID3 decisiontree algorithm-SS_ID3 algorithm-amended the multiple-valued problem and verified on both theory and experiment aspects with original and improved algorithms.Based on the comparing result of theoretical analysis and experimental verification, the study could determine that the improved algorithm which focus on the optimization of selection criteria about non-leaf nodes that imported the model of attribute structure similarity of sample data and intervened in the selection process of non-leaf nodes with information gain together inherits the advantages of the original ID3 algorithm. And it makes several progresses at multiple-valued problem, control the size of tree, classification and prediction performance. Finally, this paper analyzes the original ID3 algorithm and improved SS_ID3 algorithm respectively at the problem of multiple-valued and decisiontree construction with the actual data in verifying the theories and practicing experiments.
Keywords/Search Tags:Data mining, classification, ID3 algorithm, information gain, structure similarity of sample data, SS_ID3 algorithm
PDF Full Text Request
Related items