Font Size: a A A

Bayesian Classifier For Positive Unlabeled Learning With Uncertainty

Posted on:2013-02-07Degree:MasterType:Thesis
Country:ChinaCandidate:J Z HeFull Text:PDF
GTID:2218330374968361Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Traditional classification algorithms require a large number of labeled examples from allthe predefined classes, which is generally expensive and time-consuming to obtain in practice.PU Learning (Positive Unlableled Learning), which learns from positive and unlabeledexamples, is prevalent in real-life problems. PU Learning has been widely investigatedrecently. However, most of the works focus on certain data. Data uncertainty is prevalent inmany real-world applications, such as sensor network, market analysis and medical diagnosis.Due to imprecise measurement, outdated sources or decision errors, the precise value of datamight be unknown. Therefore, it is of great importance to explore the issue of uncertain dataclassification when only positive and unlabeled examples are available. This paper studies theproblem of uncertain data classification in PU Learning scenario and the main content andresult include:(1) We propose the problem of uncertain data classification when only positive andunlabeled examples are available, and cope with this problem based on naive Bayes classifier.Based on PNB (Positive Naive Bayes), a PU Learning algorithm for certain categorical data,we extend it to cope with uncertain categorical data by means of the concept of probabilitycardinality. Meanwhile, based on FBC (Formula-based method), a na ve Bayes algorithm foruncertain numeric data, we extend it to cope with uncertain numeric data when only positiveand unlabeled examples are available. The experiment results demonstrate that our algorithmexploiting uncertainty in the dataset can potentially achieve better classification performancecomparing to traditional naive Bayes which ignores uncertainty when handling uncertain datain PU Learning scenario.(2) When building na ve Bayes classifier in PU Learning scenario, the prior probability ofthe positive class, as a parameter, require the user to provide. An approach to automaticallyestimate this parameter is needed. We adopt two approaches to estimate the prior probabilityof the positive class. One approach is to evaluate the resulting classifer on the validation setwhich contain only positive and unlabeled examples by means of a performance measurewhich is similar to F1, and search for the value from0.1,0.2,,0.9that can make the resulting classifier achieve the best classificication performance on the validation set as theprior probability of the positive class. The other approache is to utilize the method proposedby Elkan and Noto (2008) under "selected completely at random"assumption to estimate theprior probability of the positive class directly from positive and unlabeled examples to helpbuid na ve Bayes classifier in PU Learning scenario. The experiment results show that the twoapproaches can basically achieve satisfactory classification performance on uncertain datawhile freeing the user from providing the prior probability of the positive class.
Keywords/Search Tags:uncertain data, Bayesian classification, positive unlabeled learning, na veBayes
PDF Full Text Request
Related items