Research On Positive Unlabeled Learning Algorithms For Graph Data Classification And System Implementation

Posted on:2022-11-23

Degree:Master

Type:Thesis

Country:China

Candidate:H Chen

Full Text:PDF

GTID:2518306776478394

Subject:Computer Software and Application of Computer

Abstract/Summary:

PDF Full Text Request

Graph data widely exists in daily life and scientific fields,and the classification of graph data has always been a hotspot in the field of data mining.The classification task of graph data is mainly divided into node-level classification task and graph-level classification task.When the traditional graph data classification algorithm trains the model,the user must provide the labeled nodel-level set or graph-level set containing all categories.However,in many practical applications,users can often only provide a small number of samples of interest as positive samples,but expect to identify other samples of interest.This kind of issue can be modeled as positive unlabeled learning(PU learning)problem of graph data classification.This paper proposes two PU Learning algorithms for node-level classification and graph-level classification,and uses the PU Learning algorithm of node-level classification to design and implement the paper recommendation prototype system.The main research contents and achievements are as follows:(1)Aiming at the deficiency of most existing positive-unlabeled(PU)learning methods exact only node representations to infer node labels independently.This paper proposes a positive unlabeled learning based on collective inference(PUCI),which aims to obtain the node representation,local node label dependency and positive node association information from the positive and unlabeled nodes,and infer classification of unlabeled nodes.Firstly,the positive correlation degree is calculated by the similarity-based personalized Page Rank algorithm.Secondly,the graph neural network is used to construct local classifiers and relation classifiers,and iterative optimization is carried out through the EM algorithm.The local classifier uses node representation and positive correlation degree to predict classification of unlabeled node,while the relational classifier uses node label dependencies and positive correlation degree to iteratively update node labels.Finally,positive unlabeled learning is performed by mixing non-negative and unbiased risk evaluation functions.Experiments on the real datasets Cora,Citeseer and Pubmed show that,compared with the existing node-level classification PU learning algorithm LSDAN,the averaged F1 value of PUCI improves by 5.31% under different positive labeling ratios.The experimental results show that the positive unlabeled learning algorithm based on collective inference can effectively associate the information between nodes to improve the classification effect.(2)Aiming at the deficiency of most existing positive-unlabeled(PU)learning methods for graph-level classification that only use graph structure information to identify reliable negative examples.This paper propses a positive unlabeled learning algorithm based on multiinformation fusion(GMI-Learning),which aims to use the structural information,edge information and node information of the graph to jointly infer the classification of graph-level.Firstly,the similarity index between the unlabeled graph and the known positive graph is calculated using the structural information,edge information and node information of a small number of labeled graphs.Secondly,the rank of the similarity index to obtain reliable negative examples.Through the obtained reliable negative examples,the PU problem is transformed into a binary classification problem.Finally,graph convolution and graph pooling techniques are used to obtain a graph-level representation,and a multi-layer perceptron is used as a classifier to infer classificatio of graph-level.Experiments on real datasets MUTAG,DHFR,PTC?FM,PTC?MM,PTC?FR and PTC?MR show that,compared with the existing graph-level classification PU learning algorithm(GPU-Learning),the averaged F1 value of GMI-Learning improves by 4.86% under different positive labeling ratios.The experimental results show that the positive unlabeled learning algorithm based on multi-information fusion has stronger classification performance.(3)The agricultural information content is used to extract the feature information to construct the agricultural information feature vector,and the link relationship between the agricultural information constructs the graph model.The user adds the agricultural information of interest to the favorites,and the system needs to recommend other agricultural information of interest according to the favorites page provided by the user.This paper uses the PUCI algorithm to build an agricultural information recommendation model.User favorites can be regarded as positive example nodes,and a large number of other agricultural information are unlabeled nodes.Based on this model,an agricultural information recommendation prototype system is constructed.

Keywords/Search Tags:

collective inference, positive unlabeled learning, node-level classification, graph-level classfication

PDF Full Text Request

Related items

1	Bayesian Classifier For Positive Unlabeled Learning With Uncertainty
2	A Study On Learning From Positive And Unlabeled Examples
3	Unlabeled level planarity
4	Research On Positive Unlabeled Learning Algorithms For Text And Time Series Data
5	Intrusion Detection Technology Research Based On Positive-unlabeled Learning
6	Maximize AUC With Outlier Detection For Positive-unlabeled Classification And Incremental Algorithm
7	Research On Positive And Unlabeled Learning By Random Forest
8	Spammer Detection Using Graph-level Classification Model Based On Graph Neural Network
9	Research On Aspect-level Sentiment Classification Based On Deep Learning
10	Research On Node Classification Model Based On Multi-level Graph Attention Convolutional Neural Network