| At present,networks are ubiquitous in the real world,and which are represented as a set of nodes and edges.But,this way is not conducive to the networks for various analysis tasks.A traditional method is to represent the nodes as one-hot encoding,which can express the relationship between nodes and edges visually,but it will cause space waste due to sparsity.To solve this problem,an effective method is to represent the nodes as low-dimensional,dense,real-valued vector forms in the network.The existing network representation learning methods can be divided into three broad categories:methods based on matrix factorization,random walk and deep learning.However,the methods based on matrix factorization consider only local structure of the network,the methods based on random walk pay more attention to the global structure of the network,and the methods based on deep learning are poorly interpretability.To solve the problems of simultaneously preserving the global and local structures of the network and enhancing the interpretability of the model,this paper focuses on the application of random walk with restart in network representation learning.The main work and innovations of this paper are as follows:1.To solve the problem that the network representation learning methods cannot preserve both the local and global structure of the network,this paper proposes a network representation learning method based on random walk with restart.First,this method measures the similarity between nodes by the attention mechanism,and assigns appropriate weights to edges according this similarity.Then,the global and local structures of the network are captured simultaneously by random walk with restart,and the diffusion probability of nodes is generated which can be viewed as a true probability distribution of nodes.Finally,an objective function is constructed by KL-divergence to minimize the difference between the real probability distribution of nodes and to learn the optimal node vector representation.To further improve the learning efficiency of algorithm,we apply the negative sampling and edge sampling to optimize the objective function.Experiments on real datasets show that network representation learning method based on random walk with restart can extract node feature information effectively.2.The application of homogeneous network representation learning methods in heterogeneous networks will cause the loss of node information and the vector representations learned cannot extract the complete semantic information of nodes.For these reasons,this paper proposes a representation learning method for heterogeneous networks to obtain the topology structure and rich semantic information.First,according to the importance of nodes,the heterogeneous network is decomposed into an homogeneous network and multiple bipartite networks.Random walk with restart can capture the structure and semantic information in all subnetworks.Due to the limitation of adjacency matrix dimension in bipartite network,random walk with restart cannot be directly applied.In order to eliminate this limitation,Jaccard conefficient is used to transform the adjacency matrix.To improve the effect of random walk with restart computing the diffusion probability of nodes,this method combined with the attention mechanism to measure the similarity between important nodes and assign weights to edges.Finally,the representations of nodes are obtained by fusion learning of multiple subnetworks.Random walk with restart can capture the complete topology structure of network,and the fusion learning process can obtain rich semantic information of nodes.Therefore,the heterogeneous network representation learning method proposed in this paper can effectively learn node representations containing network structure attributes and semantic information.The effectiveness of the proposed method is verified by comparison experiments on heterogeneous network datasets.3.Aiming at the difficulty of applying random walk with restart in large-scale networks,this paper introduces matrix partitioning inverse to optimize random walk with restart,and proposes an improved representation learning method for random walk with restart.In this method,to effectively invert the partitioned matrix,the Slash Burn method is first used to reorder the nodes in the network.After sorting,the adjacency matrix of network can generate a sparse block diagonal submatrix.Then,the inverse of the partitioned matrix is carried out.The application of Schur complement and LUdecomposition in the partitioned matrix can accelerate the inverse process.Finally,based on the inverse of matrix,the diffusion probability of nodes can be calculated and applied to learn the low-dimensional vector representations of nodes.The experimental results show that the matrix partitioning inverse can effectively reduce the requirements of random walk with restart on storage space and running time without changing the preserving ability of global and local structures of network.In this paper,random walk with restart can preserve the global and local structures of network and promote network representation learning method to learn nodes representations that contain rich and complete network information,and further improve the subsequent network analysis task.Through theoretical and experimental analysis,it is further proved that random walk with restart can promote the capture of the local and global structures of the network. |