Font Size: a A A

Research On Unit Cost Gains Sensitive Decision Tree Classification And Pruning Algorithm

Posted on:2017-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:M Q ZhouFull Text:PDF
GTID:2308330488975458Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data mining is an interdisciplinary in many research fields. The classification analysis algorithms have become a research hot focus in the field of data mining gradually, due to its widely application in the information industry field. The common classification techniques include the Decision Tree classification algorithm, Bayesian classification algorithm, Neural Network and Support Vector Machine (SVM), k-Nearest Neighbor classification, and so on. While the decision tree classification methods are adored by the researchers, with its fast speed, high precision, easy to be understood and other advantages.It has been researched and applied widely in the field of data mining.The main features of decision tree classification algorithms have four aspects. The first is the extended attribute selection criteria, the second is the criterion to stop the building of the tree, the third is the class label criterion of leaf nodes and the fourth is the pruning optimization strategies. At present, the research on decision tree classification algorithm mainly focuses on two aspects:the extended attribute selection criterion and the pruning optimization strategy. This paper also researches on these two aspects. There are two simple reasons for pruning the decision tree. The first, there is noise in the training data, which may make the generated decision tree over fitting to the training samples. Thus resulting in the classification effect to new actual data is not ideal. The second is the distribution of training data samples has the particularity, it will give rise to the generated decision tree cannot display the general rules. The paper expounded on pre-pruning algorithm and after-pruning algorithm systematically, and did a comparison analysis to several common post-pruning algorithms.The emergence of cost sensitive classification learning has pushed the research of decision tree classification algorithm to a new research height. However, the research directions are mostly aimed on the improvement of extended attribute selection criteria. The research on combining the cost sensitive learning with pruning algorithms is few. At the same time, the target of the cost sensitive learning is minimize the cost, ignoring the gains generated during the decision-making process. For example, in the field of investment, aggressive investors tend to sacrifice part of the cost in exchange for the greatest gains.According to the application environment that the costs and gains were coexist, the paper proposed a decision tree classification algorithm based on unit cost gains. It can achieve the goal of maximizing the gains under the same cost. On the basis of unit cost gains, the paper put forward two decision tree pruning algorithms based on unit cost gains. Finally, the paper proved the feasibility and practicability of the algorithm through experiment results. The main research work of the paper are as follows:(1) A kind of decision tree classification algorithm based on unit cost gains was proposed, which was based on the coexistence of costs and gains. In order to make up for the lack of correct classification gains during in the cost sensitive learning progress, the paper proposed a cost-gain matrix based on the cost-gain decision theory in the economic field. The paper reconstructed the new extended attribute selection criteria by using harmonic function to balance attribute information gain and cost performance. At the same time the paper used the "unit cost gains maximization" principle to instead of the traditional "majority voting" principle, as the criterion to judge the class label of leaf nodes. To verify the practicability and validity of the model, we did the experimental analysis in three parts. The first part, we compared the C4.5, CS_C4.5 (cost-sensitive learning algorithm) with UCGS by experiments. Experimental analysis of the data shows that the UCGS model can obtain the biggest gains under the same cost. In the second part, experimental analysis of the data shows that the UCGS model also has good classification effect to non-balanced problems. It has certain application value in some extent. In the third part, the UCGS algorithm also shows obvious superiority when compared with the other three cost-sensitive algorithms. On the whole, the algorithm not only can achieve the decision goal that obtain the highest gain with the minimum cost, but also can make sure the decision tree has high classification accuracy. It can solve the practical problems in the application environment, which the costs and the gains are coexistence.(2) The paper proposed a unit cost gains decision tree pruning algorithm which combined with pre-pruning strategy. This algorithm combined the unit costs gains pruning strategies with the pre-pruning strategy to prune the generated decision tree. It was that enable it to have the nature of cost sensitive. Experimental results show that the decision trees which generated by our proposed algorithm have smaller scale than decision trees which generated by REP and EBP two kinds of pruning algorithms. What is more, it shows particularly good performance on multi-label datasets. And this algorithm has good classification accuracy. It can improve the prediction accuracy of the decision tree in a certain extent. In addition, users can obtain the classification decision tree which in demand, according to adjust the cost-gain matrix.That changes the shortcoming of the inherent algorithm and has some improvement in the flexibility.(3) The paper proposed a unit cost gains decision tree pruning algorithm which based on the cost-complexity (short for UCG-CCP). This algorithm combined the unit cost gains pruning strategy with cost-complexity degree, set the pruning factor β and chose a sub-tree which had minimum β values as the final optimal decision tree. The experimental results show that the decision tree which uses UCG-CCP pruning algorithm can ensure the premise of classification accuracy and the complexity is further reduced. The generated classification model is more concise and clearer than decision tree which uses CCP pruning algorithm.
Keywords/Search Tags:Decision tree, Classification, Cost sensitive, Unit cost gains, Pruning
PDF Full Text Request
Related items