Font Size: a A A

Overfitting Problem Researching On Decision Tree

Posted on:2009-12-29Degree:MasterType:Thesis
Country:ChinaCandidate:Q WangFull Text:PDF
GTID:2178360245971697Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Knowledge Discovery in Databases (KDD) is an active research domain nowadays, and it is related to a few subjects such as artificial intelligence and database. Classification is an important research field in KDD. Decision tree is one of the models that are often used in classification, and it has been widely researched and applied since it was proposed in 1966. However, decision tree has some disadvantages such as variety bias, lack of anti-noise capability, etc, and optimization of decision tree has become a research hotspot.The dissertation focuses on suspect instances analysis and prutity distance of node two aspects, and the main achievements are as follows:1. An overview and analysis of classical and optimized decision tree algorithms is put forward.2. The Improved C4.5rules Algorithm Based On Impact-Measurement Of The Suspect Instances,devide the suspect instances from the original data effectively and compute their impact-measurements by the information gaines of it's attributes, that based on the forward works classification rules can avoid the suspect instances effectively and perform the ture situation of the data.3. According to the problems that over-fitting is serious and pre-pruning depend on the field knowledge of traditional decision tree algorithm, Decision Tree Pre-Pruning Based on PDN Trend algorithm is presented, which is based on purity distance of the nodes , find the time that when stop the decision tree growing by watching the biggest purity distance trend of the nodes, achieve pre-pruning not depend on the field knowledge and avoid the over-fitting problem and lessen the size of the decision tree obviously.4. Based on the research above, an experimental system is carried out, and the algorithms are validated both experimentally and theoretically.
Keywords/Search Tags:KDD, suspect instances, purity of the node, over-fitting
PDF Full Text Request
Related items