Font Size: a A A

The Research On The Algorithms Of Optimizing Decision Tree Classification

Posted on:2011-07-31Degree:MasterType:Thesis
Country:ChinaCandidate:X L WuFull Text:PDF
GTID:2218330338966813Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data Mining is a process which extracts effective,potential,regular knowledge and information from the large,incomplete,noisy data. Finding patterns is the task of Data Mining from data sets. Data Mining combines with theory and technology of data warehouse, artificial intelligence, machine learning, statistics and other fields. Classification and prediction technology of Data Mining is widely researched and used in many fields. Consequently, it will produce far-reaching impact on future commercial and people's lives. Since 20th century 60s, the decision tree method is widely applied in classification, prediction, extraction rules and other fields. Of course, the famous one is ID3 algorithm which was presented by Quinlan in 1986. The importance of this thesis is mainly to study ID3 algorithm of decision tree and its improvement.Firstly, theoretical basis and the process of building decision tree of ID3 algorithm are further researched. The ID3 algorithm which was presented by Quinlan is not only most famous, but also there are some its drawbacks:one is that using log is not easy to calculate and computation process is very complexity. The other:this algorithm is biased in favor of those attributes whose values is more, namely multi-value bias. In order to solve these disadvantages of ID3 algorithm, first, the thesis introduces Taylor formula and Maclaurin formula to simplify the ID3 algorithm. So, it reduces not only calculation steps of information gain of attributes, but also the information entropy of computing becomes easy. And then, with regard to the drawback of mufti-value bias, this thesis introduces a function which is associated with value of one attribute based on simplified information entropy of attribute. Through optimization of the above two aspects, the new algorithm raises the speed of the process of making decision tree. At the same time, it can also overcome the ID3's weakness which is apt to select some attribute with more values. Then, through the analysis of the same examples which use the same small training set, their decision trees obtained with improved algorithm before and after.Finally, according to the object-oriented method, this thesis uses Java to actualize ID3 algorithm and the improved algorithm. And, the improved algorithm,ID3 and C4.5 algorithm are used in the different sizes data sets.Through the analysis of simulational experiment outcome, it validates that the improved algorithm excels ID3 and Cd4.5 algorithm in time of constructing decision tree and classification accuracy.
Keywords/Search Tags:Data Mining, ID3 algorithm, Decision tree, Multi-value bias, Information gain, Information entropy
PDF Full Text Request
Related items