Font Size: a A A

Partially Supervised Learning Research Based On Generated Examples

Posted on:2020-08-01Degree:MasterType:Thesis
Country:ChinaCandidate:S Y HeFull Text:PDF
GTID:2438330620455603Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The partially supervised learning(PU learning)is a method for solving the training of the classification model under the condition that only the positive and unlabeled data are included.And in the unlabeled dataset,both positive data and negative data are included.The core strategy of PU learning is to filter out reliable negative data from unlabeled dataset,Then turn it into supervised(when all unlabeled data is considered as a negative class)or semi-supervised problem(when a small amount of negative data is found from unlabeled data,there is a large amount of available unlabeled data remaining).In the next step,the supervised learning method or the semi-supervised learning method are adopted to complete the training of the classification model,The experiment proves that the latter has better performance.This study focused on the core issues of PU learning methods,and mainly completes the following three aspects of research work:(1)Researching on classification methods boosted by partially supervised learning: The spy strategy determines the example with a probability less than the "spy" minimum as class negative example by comparing the probability that unlabeled example and "spy” example belong to class positive.Based on the above strategies,this paper verified the effect of PU learning of Naive Bayes and Logistic Regression,and then compared it with the supervised learning effects of SVM and CNN.Since the selection of the "spy" instances is randomly,the negative example found by them has two shortages: 1.The reliability is weak;2.and it may be far from the decision boundary,which is not an optimal example for training the current classifier.(2)Researching on example generation method based on adversarial classifier reverse engineering(ACRE): Our method based on reverse engineering to generate examples and the generation of examples has two purposes.1.The discriminator ensured that the current generative example must be a negative example,so that the data is further noise-reduced;2.Generate pseudo-examples that did not exist in the training sample space while close to the decision boundary.Compared with the negative example determined by "spy",the generated example has the characteristics of high reliability and large information entropy.The reliability is strong,and the data noise is reduced.Close to the decision boundary can reduce the search space of the parameters during the training of the model to improve the training efficiency.(3)Feasibility verification of generating pseudo-example for performance improvement of classification model: In this paper,we directly verified that the generated example is close to the decision boundary in iteration processing by plotting the trend of change of the distance between the generated example and the decision boundary.We also indirectly verified that the generated example is distributed near the decision boundary by reducing the dimension of the vector and drawing a two-dimensional scatter plot.Other experiment results show that the generated example can improve the model prediction performance up to 3%,and stronger robustness.But too many pseudo examples will lead to the decline of classification performance.Because the pseudo-examples are not limited by the rules,some pseudo-examples did not conform to the rules of the real sample.The next research work can be carried out around the detection of the pseudo-examples.In addition,the work of this paper has certain guiding in the research of safety issues of machine learning.
Keywords/Search Tags:PU Learning, Adversarial Training, Generative Examples, ACRE
PDF Full Text Request
Related items