Font Size: a A A

Research On Construction Of Cost-sensitive Decision Tree

Posted on:2019-07-28Degree:MasterType:Thesis
Country:ChinaCandidate:Q WangFull Text:PDF
GTID:2348330542489049Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Decision tree algorithm has a very important position in the:field of data mining,because of its advantages of simple and efficient,simple knowledge extraction and easy-to-understand generation rules.The problems of costs are ubiquitous in our real life,however,the traditional decision tree algorithms are unable to meet the cost of requirements.Therefore,it is very important to study the algorithm that combines decision tree and cost-sensitive learning.There are still many shortcomings in the existing cost-sensitive decision tree algorithms.For example,the parameter values of the heuristic function for attribute node selection are difficult to be determined.The existing algorithms perform well on small data sets,however,the efficiency is significantly reduced on big data sets.The model of decision tree results in over-fitting and low generalization capabilities due to the absence of proper pruning strategies.In this paper,in order to overcome the shortcomings of cost-sensitive decision tree algorithms,the following improvements are proposed:First,aiming at the problem of high classification cost of the existing cost-sensitive decision tree and multi-valued attribute biases,the heuristic function of CS-C4.5 algorithm is introduced and its improvement is given.The improved heuristic function is degenerated into C4.5 algorithm when an attribute is tested again.Decision tree algorithm has a very important position in the field of data mining,because of its advantages of simple and efficient,simple knowledge extraction and easy-to-understand generation rules.However,the problems of costs are ubiquitous in our real life and the traditional decision tree algorithms are unable to meet the cost of requirements.Therefore,it is very important to study the algorithm which combines decision tree and cost-sensitive learning.Experiments show that the optimized algorithm takes the information about the classification ability of the model itself,test cost and misclassification cost into account.Second,inspired by the"probability-persist-pruning" strategy,in this paper.we propose a duality algorithm of it-"probability-refusal-pruning",the idea of the pruning strategy is:when the decision tree should be pruned according to the pruning rules,the algorithm still rejects pruning with a certain probability.Comparative experiments show that "probability-refusal-pruning" strategy can further reduce the average classification cost,as well as solve the over-fitting problem and improve the generalization ability of the decision tree model.Third,in this paper,we introduce an adaptive selecting the cut point(ASCP)mechanism and an adaptive removing attribute(ARA)mechanism for cost-sensitive decision tree algorithms which are inefficient on high-dimensional and unbalanced datasets.The ASCP mechanism can greatly reduce the computational complexity and improve the efficiency of the algorithm.The ARA mechanism can adaptively remove some attributes that have less impact on the decision tree in the process of tree building,as well as simplify the process of attribute selection.Comparative experiments show that the introduction of these two mechanisms can greatly improve the construction efficiency of decision trees and perform better on big data sets.
Keywords/Search Tags:Cost-sensitive Learning, Decision Tree, Adaptive Mechanism, Big Data Sets
PDF Full Text Request
Related items