Heterogeneous information networks are ubiquitous in the real world,including citation,biological and social networks among others.These heterogeneous networks are not only large-scale,but also composed of different types of nodes and edges to preserve rich semantic information.Network embedding is proposed to learn the low-dimensional vector representations of nodes and edges,and capture the rich structural and semantic information of the network.High quality embedding can effectively improve the performance of downstream machine learning tasks(e.g.,node classification,node clustering,link prediction).Although research on network embedding has made rapid progress in recent years,most of the existing embedding methods are designed for homogeneous networks and lack consideration of heterogeneous information networks.The focus of this thesis is how to learn the low-dimensional dense node vectors in heterogeneous information networks.How to maintain the structural non-linear and high-order interaction of networks in simple and complex network patterns respectively,how to fuse multiple views in a dynamic environment to adapt to dynamic changes,and how to integrate multi-modal content under multi-source heterogeneous scenarios to enhance generalization ability are all explored,step by step.Then,a series of heterogeneous information network embedding methods for static,dynamic,complex pattern and multi-source heterogeneous scenario are proposed separately,which provides a reliable guarantee for accurate analysis and application to heterogeneous information networks.The main contributions of this thesis are the following:(1)To solve the problem of how to maintain the nonlinear network structure in heterogeneous information networks and learn the node representation with rich semantics,we first design an extensible semantic description structure,called Composite Meta-Graph(CMG).By virtue of such a structure,users do not need to worry about selection of an appropriate meta-path or meta-graph.Rich semantic relations and structural contexts between nodes of different types and of different distances can be elaborated accurately according to CMG.Moreover,a CMG based heterogeneous information embedding framework,namely CMG2 Vec,is also proposed.By expanding the auto-encoder into a heterogeneous network scenario,CMG2 Vec can embed proximities between nodes of multiple orders learned from CMG into latent representations after a series of encoding–decoding non-linear mapping.During the fusing process,an attention mechanism is adopted to automatically learn weights of these latent vectors,which enables each final node representation to focus on proximity of the most informative order.Experimental results on three large-scale datasets demonstrate that our method outperforms existing approaches in three network mining tasks in terms of node classification,node clustering,and node similarity search.(2)To explore heterogeneous network embedding with preserving high-order interactions,We extend graph neural network to heterogeneous graph scenes,and propose a novel highorder Symmetric Relation based Heterogeneous Graph Attention Network,denoted as SRHGAT.The proposed SR-HGAT first identifies the latent semantics underneath the observed explicit symmetric relations guided by different meta-paths and meta-graphs in a heterogeneous graph.The nested propagation mechanism for aggregating semantic and structural features that different links contain is then designed to calculate the interaction strength of each symmetric relation.As the core of the proposed model,to comprehensively capture both the structural and semantic feature information,a two-layer attention mechanism is applied to learn the importance of different neighborhood information as well as the weights of different symmetric relations.Extensive experimental results offer insights into the efficacy of the proposed model and have demonstrated that it significantly outperforms state-of-the-art baselines across three benchmark datasets on various downstream tasks.(3)Aiming at the problem of how to learn vector representation to adapt to network dynamic changes,we propose a novel framework for incorporating temporal information into HIN embedding,named multi-view dynamic HIN embedding(MDHNE),which can efficiently preserve evolution patterns of implicit relationships from different views in updating node vectors over time.We first transform HIN to a series of homogeneous networks corresponding to different views.Then our proposed MDHNE applies recurrent neural network(RNN)to incorporate evolving pattern of complex network structure and semantic relationships between nodes into latent embedding spaces,and thus the node vectors from multiple views can be learned and updated when HIN evolves over time.Moreover,we come up with an attention-based fusion mechanism,which can automatically infer weights of latent vectors corresponding to different views by minimizing the objective function specific for different mining tasks.Extensive experiments clearly demonstrate that our MDHNE model outperforms state-of-the-art baselines on three real-world dynamic datasets for different network mining tasks.(4)To explore heterogeneous network embedding in complex-pattern scenarios,we propose an association rules enhanced knowledge graph attention network(AR-KGAN).Firstly,an automatic rule mining algorithm is designed to select the association rules whose confidence is greater than the threshold.The purpose is to use the rich information of logic rules to improve the accuracy of knowledge reasoning and alleviate the sparsity of knowledge base;then,an effective neighborhood aggregator is proposed,which addresses the problems by aggregating neighbors with both rules-based and graph-based attention weights.Additionally,the proposed model also encapsulates the representations from multi-hop neighbors of nodes to refine their embeddings.A logic-like inference pattern is utilized as constraints for knowledge graph embedding.Then,the global loss is minimized over both atomic and complex formulas to achieve the embedding task.In this manner,we learn embeddings compatible with triplets and rules,which are certainly more predictive for knowledge acquisition and inference.We conduct extensive experiments on two benchmark datasets: WN18 RR and FB15k-237,for two knowledge graph completion tasks to evaluate the proposed ARKGAN model.The results show that the proposed AR-KGAN model achieves significant and consistent improvements over state-of-the-art methods.(5)To solve the problem of how to fuse multi-source heterogeneous information network node content to further capture event semantic node representation,a knowledge representation learning method based on multi-modal missing data fusion is proposed.First,the multilayer knowledge networks are used to model the specific combination from different modes.Then,a transform learning framework based on deep neural network is designed to extract features from incomplete multi-mode heterogeneous data,and the highly non-linear feature interaction function is used to capture the association information between multi-source heterogeneous data.Finally,a multi-modal knowledge graph attention network is proposed to map the extracted incomplete multi-modal information and interaction information between the patterns to low dimensional space,so as to capture the comprehensive multi-modal semantic information in the latent embedding space.An obvious advantage of the proposed model is that it does not require all entities in knowledge graph to have complete multi-modal information,that is to say,it does not require a particularly high-level knowledge base.In addition,the framework still has good scalability when other additional modal information is needed. |