| Graph is a widely existing data structure in various application fields.Real graph data is often high-dimensional and intractable,making it difficult to directly apply graph machine learning models.The of graph representation learning is to learn low-dimensional embeddings of nodes in a downstream task-independent form.The information in the graph is very rich,and different downstream tasks require different information,so the low-dimensional embedding learned by the graph representation must be able to integrate the information of different dimensions in the graph.However,existing self-supervised learning algorithms are basically implicitly optimized for common downstream tasks.The learned lowdimensional embedding actually violates the principle that graph embedding generation should be independent of downstream tasks,and the effect drops sharply under many downstream tasks.Furthermore,these methods are usually based on strong homophily assumptions,which are difficult to apply effectively to heterophily graphs.For the above problems,this paper proposes a novel self-supervised learning framework(Multi-view Graph Encoder for Self-supervised Graph Representation Learning,MVGE).The main contributions of the paper are as follows:(1)Multi-view modeling based on data augmentation.This paper goes beyond the traditional assumption of homophily and innovatively points out that modeling from different perspectives,learning information from different dimensions in the graph.In order to preserve the common and different information between nodes,the paper generates two types of entities through data augmentation: ego-features and aggregated features,and preserves low-frequency signals and high-frequency signals in the graph by modeling the two types of entities.Facilitates model learning on heterophily graphs without sacrificing accuracy on homophily tasks.(2)Diverse pretext tasks.In order to capture the information of different dimensions in the graph,the paper designs three proxy tasks based on the node features and topological structure of the graph: ego-feature reconstruction,aggregated-feature reconstruction,and adjacency matrix reconstruction.By optimizing diverse proxy tasks,we can learn the information of different dimensions in the graph.Firstly,based on the two entities containing different signals proposed in(1),the paper first proposes two proxy tasks,self-feature reconstruction,and aggregated feature reconstruction.Minimizing the KL divergence between the reconstructed feature distribution and the input feature distribution,reduces the information loss of low-dimensional embeddings and enhances low-and high-frequency signals.At the same time,in order to enable lowdimensional embeddings to support different downstream tasks,the paper introduces an adjacency matrix reconstruction task to obtain the topological information on the graph.(3)A novel multi-view self-supervised learning framework.To be able to model different signals and support the introduction of diverse pretext tasks,we design and implement a novel multi-view self-supervised learning framework.The framework models two entities including lowfrequency signals and high-frequency signals,and guides low-dimensional vectors to learn information from different perspectives by optimizing different pretext tasks.The encoder of our framework contains three novel designs: a simpler linear encoding scheme,encoding different entities separately,and (89)(8(6(ยท)concatenation.These three key designs can help the model maintain the integrity and personalization of different inputs and avoid the representations of different nodes tending to be similar during multiple rounds of iterative learning.Finally,by conducting comparative experiments on real-world datasets,the paper proves that our model can achieve good performance in both homophily and heterophily networks,and the learned lowdimensional embeddings can support different downstream tasks.At the same time,the paper also conducts extensive ablation experiments based on real datasets and synthetic datasets,and the experimental analysis proves the effectiveness of the design of each part of the model. |