Font Size: a A A

Study On PU Learning Based On Associative Classification Algorithm

Posted on:2018-02-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2428330512992719Subject:Information Science
Abstract/Summary:PDF Full Text Request
It is common that there are a few of labeled examples of a certain class and a large number of unlabeled examples in actual situations of user recommendation,text classification,image classification,and so on.And we always want to identify the examples belonging to the same class as the labeled examples,from unlabeled examples,according to the labeled examples.It is obviously that unlabeled examples cannot be regarded as negative examples,because there are unidentified examples belonging to the same class as the labeled examples,which possess similar features with the labeled examples.To solve the problem above,PU learning theory is proposed by Denis and other people.And PU learning algorithms are also proposed.The domestic and overseas research about PU learning is focused on the following three aspects,PU learning on static certain data,PU learning on static uncertain data,and PU learning on data streams.This dissertation focuses on the research about PU learning on static certain data.PU learning algorithms on static certain data fall into three types.The first type algorithms only learn from labeled positive examples.The second type algorithms learn from reliable negative examples got from unlabeled examples and labeled positive examples by machine learning algorithm,which only learn from positive and negative examples.The last type algorithms directly learn from labeled positive examples and unlabeled examples,which regard unlabeled examples as negative examples,and regard unidentified positive examples among unlabeled examples as noise in negative examples.What's more,because of the good interpretability of the classifier built by the rule-based algorithms,and the ability to deal with continuous numeric features and category features,the rule-based algorithms are widely used.But most rule-based PU learning algorithms are based on the decision tree algorithm in the collected papers,while few of rule-based PU learning algorithms are based on the associative classification algorithm.So the author proposes a new PU learning algorithm based on the associative classification algorithm.The PU learning algorithm proposed by the author consists of four steps.First,create class association rules.Second,adjust confidence of the rules which influenced by the imbalance distribution of examples.What else,compute the relative confidence of class association rules.Last but not least,classify examples according to relative confidence of class association rules.It can be proven that with the influence of unidentified positive examples among unlabeled examples,the observed values of positive class association rules' confidence are less than their real values.And the observed values of unlabeled classification association rules' confidence are more than their real values.There is no doubt,the key part to improve the accuracy of classification is finding out discriminative class association rules.Therefore,the author proposes the reliability of class association rules' classification result can be measured by the relative confidence.Finally,two experiments are carried out to compare the classification accuracy of the PU learning algorithm proposed by the author with the CBA algorithm and the POSC4.5 algorithm,on ten dataset got from UCI.Each dataset is divided into training dataset and testing dataset according to the proportion of 50%:50%,using random stratified sampling method.Then unlabeled dataset can be generated by combing the original negative examples and a certain proportion of positive examples in training dataset.The values of the proportion are 0%,30%,60%,and 90%.Thus different PU learning situations can be simulated where different proportions of positive examples in training dataset are changed into unidentified positive examples among unlabeled examples.What's more,AUC is used as the experiment evaluation index.The experiments' result show,the PU learning algorithm proposed by the author performs better than CBA and POSC4.5 in PU learning situations.But it is also found in the experiments above that the classification accuracy of CBA and the PU learning algorithm proposed by the author are both gradually reducing,and the result of POSC4.5 is more stable than that of the PU learning algorithm proposed by the author.
Keywords/Search Tags:Associative Classification, PU, Positive and Unlabeled Examples, CBA
PDF Full Text Request
Related items