Font Size: a A A

Research On Cost-sensitive Decision Tree For Uncetrain Data

Posted on:2013-02-07Degree:MasterType:Thesis
Country:ChinaCandidate:M J LiuFull Text:PDF
GTID:2248330374468362Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of IT, uncertain data can be wildly observed in various fields,such as scenarios of market analysis, sensor networks, environmental monitoring.Uncertain data play a key role in many applications. But the traditional research for certaindata has poor effect on handling uncertain data, and can’t meet the application requirements.So the research of data uncertainty has attracted the attention of the research community.Cost-sensitive research, however, hasn’t yet appeared in researches of uncertain data. TheCost-sensitive learning is a research direction with great Significance. It overcomes the lackthat traditional classifier only focus on accuracy, through introducing the concept of cost. Itachieves an goal to minimum total cost. In many scenarios, cost-sensitive model is morereasonable.So this paper realize the cost-sensitive learning for uncertain data. By combining withthe processing methods of uncertainty, we extend traditional cost-sensitive classificationalgorithm to uncertain data. This study are very innovative, because it not only extends theresearch direction of the uncertain data, but optimizes cost-sensitive classification model, thatcan be closer to real-world applications. The main content of research include:First, in this paper, we proposes an algorithm of cost-sensitive decision tree for uncertaindata, CSDTU. This algorithm applies the method that uncertain decision tree deal withuncertain data, to the traditional cost-sensitive decision tree. Based on the conceptPC(Probabilitic Cardinality), we define the selection criteria of splitting attribute on decisiontree, and compute related cost of building tree. Simultaneously, Using the classificationmethod of uncertain decision tree, CSDTU classifies instance by using all the path of treeuncertain instance experiented. And it can make cost-sensitive classification for both certainand uncertain data.Second, we extend the single batch test algorithm on traditional cost-sensitive decistiontree for certain data to uncertain data. For the uncertainty of data, the ordinary test of CSDTUwill do too much tests, that wastes lots of cost. Single batch test bases on cost-sensitivelearning technology, and chooses the appropriate batch of attributes to test, that can effectively reduce the total cost of classification to improve and optimize classifierperformance.In experiment, we select the experimental data sets, which meet classifier characteristic,from the UCI database. In order to verify the performance of designed classifier, we makeexperiment that compares CSDTU with traditional uncertain decision tree DTU. Furthermore,we change DTU into cost-sensitive algorithm DTU-C with the method provided by relatedresearch, and do laplace pruning to promote performance of classifier. We compare ouralgorithm with DTU-C and it’s laplace pruning algorithm. The experimental results show that,in different parameters set, CSDTU always has lower total cost than the DTU algorithm.And CSDTU can also process certain data(uncertainty with0) well. Even at high uncertainty,it is still stable, which can prove the rationality and effectiveness.The experiment of uncertain single batch test(USBT) is based on CSDTU and DTU.The experimental result shows that it can significantly improve the performance of CSDTU,and reduce the total cost of classification. But for non-cost-sensitive algorithm DTU, USBThave no impact. That illustrates USBT based on cost-senstive, can only improve oncost-sensitive type of classifier.
Keywords/Search Tags:uncertain data, cost sensitive, decision tree, uncertain single batch, classifier
PDF Full Text Request
Related items