Font Size: a A A

Research On Heuristic Graph Data Augmentation Based On Graph Neural Networks

Posted on:2024-09-01Degree:MasterType:Thesis
Country:ChinaCandidate:X LiangFull Text:PDF
GTID:2568307064986039Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the development of machine learning and deep learning,neural networks have gradually become an important tool for learning and analyzing data.People can effectively mine the hidden feature information of data with regular structure such as images and text by different neural networks.However,with the rapid development of the Internet,a large amount of unstructured data is generated and linked into graphs with complex relationships,and a variety of neural networks applicable to regular structured data have little success on such graph data.As an evolutionary deep learning tool,graph neural networks have been used to analyze various types of graph data,and they have received wide attention and admiration from researchers in recent years.The key mechanism of graph neural networks is to aggregate the spatial neighborhood features of target nodes and update their states,and exchange feature information through message passing until all nodes in the graph reach a stable equilibrium state.Studies have shown that graph neural networks are highly capable of learning representations of graph structured data,and they have achieved great progress in various analysis tasks and applications such as node classification and link prediction.Although graph neural networks have achieved relatively good performance in semi-supervised learning,they are heavily criticized for not making full use of unlabeled data and graph topology structure.In a variety of real-world scenarios,graph data labeling information often needs to be manually labeled,and most graph data are not equipped with labeling information,which in turn present different graph distributions based on different feature associations.At the same time,deep learning models are deeply affected by data quality and scale,and the problems of low data authenticity and too little data can affect the learning process of the models.This makes the graph neural network model inefficient in the message passing process during training and falls into many local optimal solutions,which leads to problems such as over-fitting and poor generalization of the model.Data augmentation,as a technique to generate more learnable data by transformation of finite labeled data,improves the generalization performance of graph neural network models and is considered as an effective solution to the over-fitting problem.However,existing studies focus on designing graph data augmentation strategies suitable for specific datasets,which are difficult to adapt to the data distribution of different datasets and cannot guarantee the consistency and diversity of the data distribution of the augmented and original graphs during the training process.At the same time,designing such specific data augmentation strategies also requires introducing prior knowledge based on different graph distributions.It is challenging to explore a general augmentation strategy to adapt to different data distributions.Therefore,inspired by automatic search algorithms and graph data augmentation to mine deep features of different graph distributions,this paper proposes a new heuristic data augmentation strategy framework,which can heuristically and adaptively match multiple graph data augmentation strategies suitable for different data distributions to solve the problem of complex graph data node classification tasks and mitigate the risk of model over-fitting.Specifically,to enhance the performance of graph neural networks in complex graphs,model-agnostic augmentation strategies are first introduced from three perspectives: global,local,and labeling,respectively,using simple graph-theoretic knowledge.We define the key parameters affecting the effectiveness of the augmentation strategies as the operation magnitudes of the strategies,and initially assign a trainable weight to each strategy,defined as the selection weight.Then,a candidate strategy search space is defined for augmentation strategies with different operation magnitudes and selection weights,and strategies with different combinations of magnitudes and weights are used as candidate strategies to balance the contributions of different graph augmentation strategies.Finally,a distribution matching-based search algorithm is used to freeze the parameters of the graph neural network trained on the original graph to validate the classification results of the augmented graph to evaluate the performance of the augmentation strategies,and to explore the best candidate strategies adaptively using a Bayesian optimization search process.After combining the designed framework with different backbone networks,the experimental results on five different common datasets are compared with other data augmentation baseline methods,and our proposed method shows a significant improvement in classification accuracy ACC and elevate the generalization ability of the graph neural network models to different data distributions.Meanwhile,we further propose a new multi-view graph contrastive learning method based on heuristic view generation.Specifically,the method constructs a data augmentation strategy based on node feature perturbation under an unsupervised task setup,and utilizes the adaptive selection module to create the best comparison views for different data distributions,making full use of graph structure information.Finally,the deviation between the adaptively constructed approximate sampling distribution and the actual sampling is corrected using a regularization term.We compare the method with advanced unsupervised and partially supervised methods on three commonly used datasets,and the method also achieves excellent performance in the node classification task.
Keywords/Search Tags:Data Augmentation, Graph Neural Networks, Contrastive Learning, Node Classification
PDF Full Text Request
Related items