Font Size: a A A

Research On Uncertain Data Classification Based On PU Learning And Bayesian Network

Posted on:2018-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:H X GanFull Text:PDF
GTID:2348330512986874Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The uncertainty of data widely exists in the real world,for example,Location Based Service,Wireless Sensor Network and Medical Diagnosis.Many reasons can cause data uncertainty,including imprecise measurement,network delay,outdated sources,sampling error and privacy protection.Traditional classification tasks require examples from all classes,but the cost of collection of labeled examples from all classes is very high in real-life applications,sometimes even impossible.Meanwhile,a large number of unlabeled examples is easy to obtain.In some binary-classification problems,a subset of labeled positive examples(the class of examples we are interest in)and abundant unlabeled examples are available.PU Learning(Positive Unlabeled Learning)means that learning from training set that only consists of a subset of labeled positive examples and unlabeled examples.This is very common in real life applications,such as credit fraud detection and text classification.Recently,PU Learning has drawn some attention in research community,but most of the research focuses on precise data.Currently,there is only UPNB algorithm for classification on uncertain data under PU Learning scenario(He et al.2010).The conditional independent assumption in it does not always hold true in real-life application,so it depresses the classification performance of UPNB.To address this problem,this research discusses the problem of classification on uncertain data under PU Learning scenario with Bayesian network.The major research contents and achievement of this research includes:(1)Research on calculation of mutual information between uncertain attributes under PU Learning scenario is conducted.Conditional Mutual Information(UCMI)is proposed to tackle that problem.Traditional TAN(Tree Augmented Na?ve Bayes)Bayesian network classification algorithm can only learn the TAN network structure(a tree structure)from all labeled precise data.This research modifies the methods of calculating mutual information for PU Learning scenario with probability cardinality to propose UCMI which can measure dependency among uncertain attributes,so as to find the parents attribute of every attribute and determine the Bayesian network structure.(2)Research on utilizing Bayesian network to solve the problem of uncertain data classification under PU Learning scenario.UPTAN(Uncertain Positive Tree Augmented Na?ve Bayes),a Bayesian network classification algorithm,is proposed to tackle it,which outperforms the state-of-art na?ve Bayes based algorithm UPNB in this area.For learning the Bayesian network structure,UPTAN utilizes UCMI to calculate the dependency among uncertain attributes,so as to determine TAN network structure from uncertain data under PU Learning scenario.As for learning parameters of the Bayesian network classifier,UPTAN extends the methods of parameter learning in a PU Learning algorithm called PTAN(Positive Tree Augmented Na?ve Bayes)by replacing the frequency statistics by probability cardinalities to handle uncertain data and PU Learning scenario.(3)In order to evaluate the classification performance of UPTAN and the impacts uncertain attributes have on UPTAN.Experiment has been conducted on 20 UCI datasets in this research.Experimental results show that UPTAN outperforms UNPNB,a state-of-art algorithm under this scenario,by 3.37%(using F1 as the performance indicator)and more representative the uncertain attributes,more uncertainty the data contains,the more drop in classification performance of UPTAN.
Keywords/Search Tags:positive unlabeled learning, uncertain data, Bayesian network classifier
PDF Full Text Request
Related items