| Heterophilic networks,composed of nodes and edges,can model the essential characteristics of “same-sex repels and opposite-sex attracts” in a complex system.For example,in chemical molecular networks,molecules are more likely to be composed of different classes of atoms;in online transaction networks,scammers are more likely to contact customers than other scammers.Network representation learning is a bridge connecting networks and machine learning.It aims to leverage edge and node information from original networks to project the entire network or network nodes into a low-dimensional space,enabling subsequent machine learning tasks,such as node classification,clustering,and visualization,to be performed more efficiently on these low-dimensional dense vectors representing real numbers.Aiming at the challenges faced by heterophilic network representation learning,this paper uses the knowledge of network science to conduct research from two aspects: models and networks,respectively.The main research results and innovations of this paper are as follows:(1)As a previous study of heterophilic network representation learning,a comprehensive graph dimensionality reduction algorithm is designed.The algorithm can ensure the lower limit of its linear dimensionality reduction as the principal component analysis(PCA)algorithm and reveal the surface structure of manifolds in terms of nonlinear dimensionality reduction.Firstly,a Markov transition matrix is constructed for high-dimensional data so that the more similar nodes have higher transition probabilities.Then,the mapping relationship that high-dimensional data are embedded in low-dimensional space is optimized.The experimental results show that in the linear dimensionality reduction of small-world networks,the dimensionality reduction effect of the proposed algorithm is equivalent to that of the PCA algorithm.In contrast,the locally linear embedding(LLE)algorithm fails.The proposed algorithm is equivalent to the LLE algorithm in the nonlinear dimensionality reduction of a manifold network,but the PCA algorithm fails.(2)Aiming at the problem that current graph neural networks(GNNs)based on the assumption of homophily between connected nodes cannot be directly applied to heterophilic networks,we propose a novel GNN called NEDA based on neighborhood expansion for heterophilic network representation learning.First,inspired by the idea of modeling infectious disease spreading dynamics,we expand the neighborhood of each node using the SI model along with feature proximity.Second,during each training process,some nodes are uniformly sampled from the extended neighborhood for aggregation to speed up the optimization process of a set of parameter matrices at the maximum available training data with minimal computational cost.Finally,node classification and clustering experiments and visualization of node representations are performed on benchmark heterophilic network datasets with variable sizes,where the results verify the effectiveness of our NEDA model.(3)The dominant paradigm in deep learning was to download benchmark datasets and design and train neural networks based on them to improve the performance of downstream tasks.Model-centric GNNs have developed significantly in this paradigm.However,in some cases,it is now more productive to fix the neural network architecture and instead look for ways to preprocess the networks with the help of domain knowledge and network science,known as data-centric GNNs(D_c_GNNs),which is still under-explored.Based on this,we propose a novel method for transforming the original structure of heterophilic networks that can enhance the node classification performance of GNNs.First,a machine learning toolkit is used to train edge classifiers,which aims to predict whether the labels of the nodes at both ends of the edges are the same.Then we use these classifiers to transform the structure of heterophilic networks to improve their homophily.Finally,we conduct node classification experiments on benchmark heterophilic network datasets with variable sizes,and the experimental results demonstrate the effectiveness of our D_c_GNN.As complex systems evolve,networks become increasingly complex.This paper’s research on heterophilic network representation learning provides a new idea for applying network science in other network representation learning,such as heterogeneous information networks,dynamic networks,and hypergraphs. |