Font Size: a A A

Random Walk And Autoencoder Based Heterogeneous Information Network Representation Learning

Posted on:2022-04-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y T HuangFull Text:PDF
GTID:2518306605471404Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Networks are widely used to model complex systems and express the interconnected entities.With the rapid development of big data technology and the continuous expansion of network scale,heterogeneous information networks(HINs)have been adopted to characterize data with multiple types and diverse relations,such as reference networks,biological networks and social networks.Traditional network analysis methods use high dimensional sparse vector to capture network information,which suffer from the problems of expensive calculation cost.The efficient network embedding can not only reduce the calculation cost,but also facilitate various downstream applications such as node classification,node clustering and recommendation,which has high research value.Heterogeneous information network representation aims to encode networks into a lowdimensional embedding space.By devising efficient indexes or parallel algorithms for the embedding space,large-scale heterogeneous information networks can be analyzed.Moreover,deep learning and nonlinear dimensionality reduction techniques are rapidly applied to learn network features,which can reduce redundant information while preserving structural information of networks.For heterogeneous information networks,the purpose of network representation learning is to integrate networks topology and heterogeneous information into node embedding vectors simultaneously.This thesis presents two methods based on random walk and autoencoder for learning low-dimensional dense vector representation of heterogeneous information networks.In summary,the main contributions of this dissertation are as follows:(1)Random-walk-based representation learning method for heterogeneous information networks: In order to effectively utilize the semantic and structural information of networks,this algorithm proposes a meta-path weight learning method and an improved random walk strategy.Firstly,meta-path set is selected from network schema,then the normalized weights of each meta-path can be obtained by calculating the similarity losses of labeled nodes.Secondly,we proposed two patterns to guide the random walk process.By controlling the probability of pattern selection,the semantic information and structural dependencies from meta-path and others can be leveraged sufficiently in context sequences generated.Then the context sequences are input into Skip-gram model to learn the network embedding under each meta-path.Finally,the final network embedding is synthesized according to the meta-path weight and corresponding network embedding.Experimental results on multiple datasets demonstrate the proposed algorithm can effectively enhance the accuracy of network analysis tasks.(2)Autoencoder-based representation learning method for heterogeneous information networks: This method proposes a unified representation learning framework,which can simultaneously represent global and local information.Aiming to learn the above two types of information characteristic,we design different network embedding methods.For global information,we reconstruct heterogeneous information networks into multiple homogeneous information networks based on symmetric meta-path,and separately train autoencoder to learn the second-order similarity embedding of each homogeneous network.For local information,to generate subgraphs,we proposed a balanced neighborhood sampling method.Therefore,the nearest neighbor embedding can be learned by appraising the co-occurrence probability of central nodes and context nodes,which can be defined in the subgraph.Finally,two kinds of embedding are aggregated as the final network embedding.Extensive experiments on real-world datasets demonstrate the efficacy of our proposed framework.
Keywords/Search Tags:Heterogeneous information network, Network representation learning, Random walk, Autoencoder, Skip-gram model
PDF Full Text Request
Related items