Font Size: a A A

Research On Decision Tree For Mining Uncertain Data With PU-Learning

Posted on:2013-10-23Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhangFull Text:PDF
GTID:2248330374468361Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years, with the development of Communication technology and Internet, a kindof data has appeared to many application, which has the characteristics as follows:(1)The sample which were labeled are all positive example (the particular class usersfocus on and we can call it target class). The others are unlabeled, and they can be positiveexample or not. We all know that, labeling samples is very time-consuming and cost-intensive,so if the data set is huge, it’s unrealistic to label all samples. As a special kind ofsemi-supervised learning, PU learning only requires a small number of positive examples anda huge number of unlabeled examples. It can save a lot of time and cost however it mayreduce the classification performance.(2)Data contains uncertainty. The uncertain data appears in many applications, manyfacts contribute to the uncertainty, such as the random nature of the physical data generationand collection process, measurement and decision errors and so on.This thesis mainly discussed how to deal with the data as we mentioned above, and wecall it Uncertain data with PU-Leaming.the reach is how to build a decision tree, which candeal with uncertain data with PU-learning. To summarize, there are mainly two contributions:(1)Build decision tree for uncertain categorical data with PU-learning(DTUC-PU). Wepropose PU probabilistic information gain and express uncertain categorical data to certainprobability. based on the information gain algorithm in POSC45, we build9decision trees,and choice the best one as result. Experimental results on UCI datasets demonstrate that theproposed algorithm has good classification accuracy and it is robust against data uncertainty.(2)Build decision tree for uncertain Numerical data with PU-learning(DTUN-PU).Inthis research, based on the information gain algorithm in POSC45and considering theuncertain data interval and probability distribution proposed in UDT, we propose a decisiontree algorithm DTUN-PU (Decision Tree for Uncertain Numerical data with PU-Learning),which can handle uncertain data with uncertain numerical attribute. Experimental results onUCI datasets demonstrate that the proposed algorithm has good classification accuracy and itis robust against data uncertainty.
Keywords/Search Tags:PU Learning, Uncertain, Decision Trees
PDF Full Text Request
Related items