The graph is a kind of highly expressive data structure.Graph-structured data is prevalent in the real world,such as social networks,biological networks,citation networks,and so on.Mining and analyzing information from real-world networks play crucial roles in many emerging applications.Therefore,graph representation learning techniques that aim to learn potential,low-dimensional graph representations while preserving graph topology information,node properties,and other side information have emerged.Graph representation learning aims to learn low-dimensional node representations for the nodes in a graph.A high-quality node representation should have sufficient expressiveness,i.e.,preserving enough meaningful information(e.g.,feature information,structure information,and label information)in the original graphstructured data.These learned node representations can be applied in downstream graph analysis tasks,such as node classification,graph classification,link prediction,visualization,etc.According to the type of neural network model used,graph representation learning methods are broadly categorized into two types: shallow graph embedding models and graph neural network models.Shallow graph embedding models use embedding lookup tables to directly update the node representations based on graph structure information.While graph neural networks utilize node feature information more effectively by applying a deep neural network architecture to extract meaningful properties of the graph.This paper conducts research work on the following problems in the two types of graph representation learning methods:(1)Processing of non-homophily graphs: Although the graph representation learning algorithm has achieved excellent performance on a large number of benchmark graph datasets,there is a homophily assumption in both the shallow graph embedding method and the graph neural network method,namely it is assumed that adjacent nodes are more likely to have similar node features or labels.However,in the real world,there are also scenarios where "opposites attract" and lead to graphs with heterophily:adjacent nodes may come from different classes or have different features.However,most of the existing graph representation learning methods can not deal with heterophily graphs like dealing with homophily graphs.(2)Optimization of negative sampling strategy: The negative sampling strategy is an indispensable technical module in the unsupervised task of graph representation learning.The current static negative sampling strategies based on uniform distribution and node degree distribution face the problems of slow convergence and even vanishing gradient.How to sample as few negative samples as possible to achieve better task performance is the key to the optimization of negative sampling strategies.(3)Representation of hierarchical graph: A major limitation of current graph neural network methods is that their architectures are inherently flat since they only propagate feature information along the edges of the graph and cannot infer and summarize information in a hierarchical manner.And the extraction of such hierarchical structure is indispensable for the task of graph classification because the goal of graph classification is to predict labels related to the whole graph.(4)Over-smoothing problem: Most of the current graph neural network models are based on the assumption of homophily to smooth the representations of neighboring nodes and propagate their information.However,neither theoretically nor experimentally has the increase in the number of iterations of information propagation through deepening neural network models resulted in more discriminative node representations,but rather made the node representations overly similar.Aiming at solving the above four problems,this paper proposes new designs to learn high-quality node representations,which consequently improve the performance and efficiency of downstream graph analysis tasks.The main contributions and innovations are as follows:(1)A shallow graph embedding model based on local structure patterns: To improve the model’s ability to handle non-homophily graphs,we propose a shallow graph embedding model based on local structure patterns,where local structure patterns are captured using anonymous random walks.The model supplements the traditional CBOW model with local structure information,which significantly improves the model’s effectiveness in node visualization,link prediction,and node classification tasks.(2)A negative sampling strategy based on dimensional mixture distribution: To improve the training speed of unsupervised graph representation learning and avoid the vanishing gradient problem,an adaptive negative sampling strategy based on dimensional mixture distribution under the contrastive framework is proposed.The strategy can efficiently sample semi-hard negative samples and be adapted to any shallow graph embedding model and graph neural network model under the contrastive framework.The unsupervised graph representation learning model with this strategy can significantly improve the performance of visualization and node classification tasks.(3)A graph pooling method based on graph multi-head attention mechanism: To improve the interpretability of hierarchical graph representations and their effectiveness in graph classification tasks,a graph pooling method based on graph multi-head attention is proposed,which significantly improves the effectiveness of graph classification and graph reconstruction tasks by maximizing the mutual information between hierarchical graph representations and node representations.(4)A graph neural network model with enhanced locally adaptive smoothing: To improve the processing capability of graph neural network models for graphs with heterophily and to alleviate the over-smoothing problem,we propose a graph neural network model with enhanced locally adaptive smoothing by jointly constraining node disagreement in node representations and more trustworthy initial features.The model significantly improves node classification in multiple scenarios of homophily graphs,heterophily graphs,adversarial attacks,and long-distance dependence. |