Font Size: a A A

The Study On Positive And Unlabeled Learning By Label Propagation

Posted on:2018-07-20Degree:MasterType:Thesis
Country:ChinaCandidate:S X MaFull Text:PDF
GTID:2348330533957865Subject:computer science and Technology
Abstract/Summary:PDF Full Text Request
In many practical applications of classification,labeled negative examples are hard to obtain or even not available.Hence,PU learning,which only uses positive examples and unlabeled examples to train the classifier,has attracted great attention in recent years.Currently,there is a typical kind of approach for PU learning problem by building the classifier in two-step strategy.In step one,a set of reliable negative examples are identified from the given unlabeled set,and considered as negative set.In step two,an existing supervised or semi-supervised learning method is applied to train the classifier.Apparently,the key of the two-step strategy PU learning algorithm is to extract reliable negative examples.Most published two-step strategy can hardly extract reliable negative examples in step one with only small amount of labeled positive examples available.Graph-based semi-supervised learning is usually effective for the classification task in the case of small size of labeled training.The graph-based PU learning,which combines the graph-based method with classical two-step methods of PU Learning,has been proposed in recent years.However,there are still some problems with graph-based PU learning:1.The fully connected graph and the similarity measurement which based on Euclidean distance has been widely used in existing graph-based PU learning algorithms,but it is not a good way to measure the similarity between examples which are not similar in Euclidean distance.2.Although the graph-based method has been used,the graph-based PU learning algorithm still can't extract reliable negative examples which are in high precision.Focusing on the problems above,we propose a novel PU learning algorithm following two-step strategy: PU-LP.PU-LP has two innovative points:1.The approach which takes Katz index to calculate the similarity matrix on kNN graph is proposed and used in PU-LP.2.Before the reliable negative examples are extracted,PU-LP extracts a set of reliable positive examples to enlarge the labeled positive set and an iterative method which is used to extract reliable positive examples is proposed.Experiments on UCI datasets shows that PU-LP has excellent performance when there is only small amount of labeled positive examples available,and it outperforms than PNB algorithm.
Keywords/Search Tags:PU learning, label propagation, similarity measurement
PDF Full Text Request
Related items