Font Size: a A A

Research On Extraction Of Reliable Negative Instances In Semi-supervised PU Learning

Posted on:2021-12-07Degree:MasterType:Thesis
Country:ChinaCandidate:T T LiFull Text:PDF
GTID:2518306194491274Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Semi-supervised learning refers to the training process with a large number of "cheap" unlabeled instances and a small number of "expensive" labeled instances.PU learning is a special branch in semi-supervised learning.The training instances of traditional semi-supervised learning to do binary classification include positive and negative instances.While PU learning uses only labeled positive instances and unlabeled instances for training,and does not use labeled negative instances.Since there are no labeled negative instances in the initial labeled instances of PU learning,it is necessary to construct a classifier to find the reliable negative instances hidden in the unlabeled instances.Then add the reliable negative instances to the initial labeled instances,and finally construct a new classifier to classify the unlabeled instances.However,there are many problems when selecting reliable negative instances by constructing classifiers in PU learning.For example: how to effectively mine the spatial structure of few initial positive instances,and then select reliable negative instances;How to avoid the influence of noise points and outliers on the process of extracting reliable negative instances;In the process of selecting reliable negative instances with spy technology,how to solve the problem of low efficiency in dividing reliable negative instances by randomly selecting spy instances;How to ensure purity of the remaining instances after extracting reliable negative instances from the unlabeled instances.In this paper,the problems that it is difficult to effectively mine the spatial structure of data sets when extracting reliable negative instances in PU learning,and it is easily affected by noise points have been studied.The research work includes the following contents:(1)A PU learning method based on data fuzziness to select reliable negative instances from unlabeled instances was proposed.This method firstly carries out semi-supervised clustering on positive and unlabeled instances.The clustering results are used to classify the data fuzziness.The low-fuzzy data close to the positive instances are selected to expand the initial positive instances,while the low-fuzzy data far from the positive instances are taken as reliable negative instances;Then the high-fuzzy data in unlabeled instances are edited;Finally,a classifier is trained on the expanded labeled instances to classify the initial unlabeled instances.(2)A PU learning method combining spy technology and semi-supervised self-training is proposed.The method uses spy technology to extract reliable negative instances from unlabeled instances.Then the remaining instances are regarded as new unlabeled instances,and the new unlabeled instances are purified by self-trained.The missed reliable negative instances are retrieved by secondary training.(3)Based on the combination of spy technology and semi-supervised self-training PU learning,the spy technology is improved.By mining the spatial distribution information of initial positive instances,the cluster center of positive instances is calculated,and the instances close to cluster center are found as spy instances.The redefined spy instances are closer to clustering center in spatial structure,and contain more accurate information.When such instances are selected as spy instances,the distribution of unknown positive instances in unlabeled instances can be more effectively reflected.
Keywords/Search Tags:positive and unlabeled learning, reliable negative instances, data fuzziness, spy technology, self-training method
PDF Full Text Request
Related items