Graph representation learning is an important foundation for downstream tasks in network knowledge data mining.While traditional graph representation learning methods have made great progress as research has progressed,researchers have found that for the graph structure itself,the objects and components that make up the nodes are not necessarily all of the same type in the real world.The research of representation learning methods for such heterogeneous graphs provides new development opportunities for the field of graph data mining.In recent years,although heterogeneous graph-based representation learning methods have made significant progress,there are still some problems:traditional random walk-based heterogeneous graph representation learning methods are difficult to maintain the balance between semantics and structures,and cannot simultaneously take into account the node semantic region features and network structural features;traditional heterogeneous graph representation learning methods require the prior knowledge of experts to guide the random walk strategy;traditional graph neural network-based heterogeneous graph representation learning methods cannot learn the global node features of the graph,and inevitably cause knowledge loss when converting heterogeneous graphs to homogeneous graphs.Based on the above background,this dissertation anchors the study of representation learning algorithms based on heterogeneous graphs,and insists on fusing and exploiting the structural and semantic information in heterogeneous graphs to improve the performance of representation learning algorithms.The main research includes the following three parts:(1)To address the problem that existing representation learning methods based on meta-path guided random walks to obtain sequences of nodes in heterogeneous graphs are not easy to maintain the balance between semantics and structures,the TSDW algorithm is proposed for context paths embedding between two nodes in a temporal heterogeneous graph.Experimental results show that the TSDW algorithm outperforms existing graph representation learning methods based on the random walks for both node classification and clustering tasks,and this section further analyses the impact of relevant parameters in the TSDW algorithm on its performance.(2)To address the problem that the node sequences captured by the random walk-based heterogeneous graph representation learning methods cannot simultaneously take into account the semantic region features of the nodes’ proximal neighbours and the structural features of the nodes which deep in the network,this dissertation presents a generalized motif-based higher-order representation learning algorithm(MBRep).The performance of the MBRep algorithm is evaluated on three real datasets using a link prediction task,and its AUC and MRR performance is higher than that of existing graph representation learning methods based on random walks,and a cold-start test is also performed on the algorithm’s adaptability to the addition of new nodes and links to the graph.(3)To address the problem that current meta-path-based graph neural network models have difficulty in learning global node representation vectors of heterogeneous networks and the inevitable information loss associated with the conversion of heterogeneous graphs to homogeneous graphs using meta-path guidance,an end-to-end motif-based hierarchical attention graph neural network model(MBHAN)is presented for global node learning in heterogeneous graphs.The node-level and the motif subgraph-level attention mechanism,respectively.This section evaluates the performance of the MBHAN algorithm on two real datasets through node classification and clustering tasks,with both F1 and NMI performance higher than the existing state-of-the-art graph representation learning methods,and the hyperparameters involved in the algorithm are tested for sensitivity.In summary,in the heterogeneous graph representation learning process,the TSDW algorithm fuses the structural and semantic information of context paths,the MBRep algorithm fuses the structural and semantic information of motif instances,and the MBHAN algorithm fuses the structural and semantic information in motif subgraphs.The experimental results show that this method of simultaneously taking into account the structure and semantics in heterogeneous graphs can effectively improve the performance of graph representation learning algorithms,providing new ideas and insights for heterogeneous graph representation learning research. |