| The graph structure enables abstract modeling of complex systems in a variety of real-world domains,and related research on graphs is extremely helpful in understanding and grasping the laws of reality.The tasks of node classification and link prediction on graphs,which are two major tasks in graph research,aim to predict the existence of node classes in graphs and predict the connections of edge between nodes,have a wide range of application scenarios in real life.Node classification method and link prediction method based on graph embedding technique have better performance and often perform better than traditional method,and complex heterogeneous graph is one of the current challenges of graph embedding techniques.Existing heterogeneous graph embedding methods usually focus on local structural information under specific semantics,while ignoring the structural and global information of the heterogeneous graph itself,resulting in the formation of embeddings that have limitations in specific tasks.In order to perform effective prediction on heterogeneous graphs,thesis investigates the graph embedding technique,introduces and analyzes the classical graph neural network embedding model,and then proposes a heterogeneous graph embedding method that fuses heterogeneous graph topological information with semantic information and make mutual information maximization,to achieve heterogeneous graph node classification and link prediction tasks.The main contents are as follows:(1)Thesis proposed THAN,a heterogeneous graph embedding model that uses heterogeneous graph topological information as feature enhancement followed by information propagation with specific semantics.The heterogeneous graph topology is pre-trained to obtain feature representations of nodes and fused with the nodes’ own features,and the information is propagated using attention mechanisms under specific semantics,and finally node feature representations are generated that can be used for each downstream task.In thesis,experiments were conducted on the publicly available dataset ACM,and the proposed THAN outperformed the compared baseline model in both metrics in the node classification task,and was able to achieve an improvement of about1%.In the link prediction task,the main reference metrics AUC are better than the compared baseline model,with 3%~4% improvement respectively with different percentage of training set,indicating that the proposed method can improve the quality of node features generated by the graph embedding model and better achieve the specific tasks.(2)A heterogeneous graph comparison learning method based on maximizing local and global mutual information is proposed and constructed together with the THAN model to form the THAN-CL model.THAN-CL optimizes the model parameters by maximizing both local and global mutual information and objectives given by specific task together.This model also proposes strategies that can be flexibly selected in several steps of the model for link prediction tasks,including negative sampling strategy,similarity measure strategy and target calculation strategy,to improve the performance and applicability of the model in different inference scenarios.Finally,experiments were conducted on the public dataset,and THAN-CL with the introduction of the contrast learning module showed about 0.8% improvement compared to THAN in the node classification task,indicating that the contrast learning approach enables the model to generate higher quality feature representations.In the link prediction task,THAN-CL with the introduction of contrast learning achieves a 1 to 2% improvement,and in addition,the impact of the proposed strategy on the model’s performance in the link prediction task is compared. |