Bayesian Classifier For Positive Unlabeled Learning With Uncertainty

Posted on:2013-02-07

Degree:Master

Type:Thesis

Country:China

Candidate:J Z He

Full Text:PDF

GTID:2218330374968361

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Traditional classification algorithms require a large number of labeled examples from allthe predefined classes, which is generally expensive and time-consuming to obtain in practice.PU Learning (Positive Unlableled Learning), which learns from positive and unlabeledexamples, is prevalent in real-life problems. PU Learning has been widely investigatedrecently. However, most of the works focus on certain data. Data uncertainty is prevalent inmany real-world applications, such as sensor network, market analysis and medical diagnosis.Due to imprecise measurement, outdated sources or decision errors, the precise value of datamight be unknown. Therefore, it is of great importance to explore the issue of uncertain dataclassification when only positive and unlabeled examples are available. This paper studies theproblem of uncertain data classification in PU Learning scenario and the main content andresult include:(1) We propose the problem of uncertain data classification when only positive andunlabeled examples are available, and cope with this problem based on naive Bayes classifier.Based on PNB (Positive Naive Bayes), a PU Learning algorithm for certain categorical data,we extend it to cope with uncertain categorical data by means of the concept of probabilitycardinality. Meanwhile, based on FBC (Formula-based method), a na ve Bayes algorithm foruncertain numeric data, we extend it to cope with uncertain numeric data when only positiveand unlabeled examples are available. The experiment results demonstrate that our algorithmexploiting uncertainty in the dataset can potentially achieve better classification performancecomparing to traditional naive Bayes which ignores uncertainty when handling uncertain datain PU Learning scenario.(2) When building na ve Bayes classifier in PU Learning scenario, the prior probability ofthe positive class, as a parameter, require the user to provide. An approach to automaticallyestimate this parameter is needed. We adopt two approaches to estimate the prior probabilityof the positive class. One approach is to evaluate the resulting classifer on the validation setwhich contain only positive and unlabeled examples by means of a performance measurewhich is similar to F1, and search for the value from0.1,0.2,,0.9that can make the resulting classifier achieve the best classificication performance on the validation set as theprior probability of the positive class. The other approache is to utilize the method proposedby Elkan and Noto (2008) under "selected completely at random"assumption to estimate theprior probability of the positive class directly from positive and unlabeled examples to helpbuid na ve Bayes classifier in PU Learning scenario. The experiment results show that the twoapproaches can basically achieve satisfactory classification performance on uncertain datawhile freeing the user from providing the prior probability of the positive class.

Keywords/Search Tags:

uncertain data, Bayesian classification, positive unlabeled learning, na veBayes

PDF Full Text Request

Related items

1	Research On Uncertain Data Classification Based On PU Learning And Bayesian Network
2	Study Of Classifying Uncertain Data Streams
3	Research And Implementation Of Classification Algorithm For Positive And Unlabeled Examples Learning On Uncertain Data Stream
4	Research On Positive Unlabeled Learning Algorithms For Graph Data Classification And System Implementation
5	Research On Positive Unlabeled Learning Algorithms For Text And Time Series Data
6	A Study On Learning From Positive And Unlabeled Examples
7	Intrusion Detection Technology Research Based On Positive-unlabeled Learning
8	Maximize AUC With Outlier Detection For Positive-unlabeled Classification And Incremental Algorithm
9	Research On Positive And Unlabeled Learning By Random Forest
10	Large-Scale Positive And Unlabeled Learning