Font Size: a A A

Research On Positive Unlabeled Learning Algorithms For Graph Data Classification And System Implementation

Posted on:2022-11-23Degree:MasterType:Thesis
Country:ChinaCandidate:H ChenFull Text:PDF
GTID:2518306776478394Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
Graph data widely exists in daily life and scientific fields,and the classification of graph data has always been a hotspot in the field of data mining.The classification task of graph data is mainly divided into node-level classification task and graph-level classification task.When the traditional graph data classification algorithm trains the model,the user must provide the labeled nodel-level set or graph-level set containing all categories.However,in many practical applications,users can often only provide a small number of samples of interest as positive samples,but expect to identify other samples of interest.This kind of issue can be modeled as positive unlabeled learning(PU learning)problem of graph data classification.This paper proposes two PU Learning algorithms for node-level classification and graph-level classification,and uses the PU Learning algorithm of node-level classification to design and implement the paper recommendation prototype system.The main research contents and achievements are as follows:(1)Aiming at the deficiency of most existing positive-unlabeled(PU)learning methods exact only node representations to infer node labels independently.This paper proposes a positive unlabeled learning based on collective inference(PUCI),which aims to obtain the node representation,local node label dependency and positive node association information from the positive and unlabeled nodes,and infer classification of unlabeled nodes.Firstly,the positive correlation degree is calculated by the similarity-based personalized Page Rank algorithm.Secondly,the graph neural network is used to construct local classifiers and relation classifiers,and iterative optimization is carried out through the EM algorithm.The local classifier uses node representation and positive correlation degree to predict classification of unlabeled node,while the relational classifier uses node label dependencies and positive correlation degree to iteratively update node labels.Finally,positive unlabeled learning is performed by mixing non-negative and unbiased risk evaluation functions.Experiments on the real datasets Cora,Citeseer and Pubmed show that,compared with the existing node-level classification PU learning algorithm LSDAN,the averaged F1 value of PUCI improves by 5.31% under different positive labeling ratios.The experimental results show that the positive unlabeled learning algorithm based on collective inference can effectively associate the information between nodes to improve the classification effect.(2)Aiming at the deficiency of most existing positive-unlabeled(PU)learning methods for graph-level classification that only use graph structure information to identify reliable negative examples.This paper propses a positive unlabeled learning algorithm based on multiinformation fusion(GMI-Learning),which aims to use the structural information,edge information and node information of the graph to jointly infer the classification of graph-level.Firstly,the similarity index between the unlabeled graph and the known positive graph is calculated using the structural information,edge information and node information of a small number of labeled graphs.Secondly,the rank of the similarity index to obtain reliable negative examples.Through the obtained reliable negative examples,the PU problem is transformed into a binary classification problem.Finally,graph convolution and graph pooling techniques are used to obtain a graph-level representation,and a multi-layer perceptron is used as a classifier to infer classificatio of graph-level.Experiments on real datasets MUTAG,DHFR,PTC?FM,PTC?MM,PTC?FR and PTC?MR show that,compared with the existing graph-level classification PU learning algorithm(GPU-Learning),the averaged F1 value of GMI-Learning improves by 4.86% under different positive labeling ratios.The experimental results show that the positive unlabeled learning algorithm based on multi-information fusion has stronger classification performance.(3)The agricultural information content is used to extract the feature information to construct the agricultural information feature vector,and the link relationship between the agricultural information constructs the graph model.The user adds the agricultural information of interest to the favorites,and the system needs to recommend other agricultural information of interest according to the favorites page provided by the user.This paper uses the PUCI algorithm to build an agricultural information recommendation model.User favorites can be regarded as positive example nodes,and a large number of other agricultural information are unlabeled nodes.Based on this model,an agricultural information recommendation prototype system is constructed.
Keywords/Search Tags:collective inference, positive unlabeled learning, node-level classification, graph-level classfication
PDF Full Text Request
Related items