Font Size: a A A

Research On Efficient And Interpretable Graph Mining Technology

Posted on:2023-02-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y M YangFull Text:PDF
GTID:1520306917479864Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years,a large amount of relational data has been generated by the Internet.They can usually be described by the data structure of graphs(or networks).Graph data contains a lot of useful knowledge and information,which is of great significance to production activities in the real world.Therefore,how to effectively mine these knowledge and information is a very attractive research topic.The research on discovering knowledge in graph data has roughly gone through three surges,i.e.,traditional graph mining,graph embedding,and graph neural network.In the past few years,due to the rapid development of deep learning and its wide application in the graph research field,deep graph learning techniques have made substantial progress in many graph analysis tasks.However,in this dissertation,we notice that most existing graph data mining methods still face challenges in four aspects:(1)The diversity of network modality.The graphs in the real world are usually difficult to be simply described by nodes and edges,but appear in various complex forms such as attribute graphs,heterogeneous graphs,and etc.?(2)The interpretability of the model.In practical application scenarios,users not only expect the model to make predictions with high accuracy,but also want to know why a specific prediction is made?(3)The large scale of the network.A real-world network is usually very large,with a massive number of of nodes and edges.Therefore,a desired algorithm is expected to achieve low time complexity and good scalability in order to be effectively applied to realworld networks?(4)The scarcity of supervision labels.Current mainstream semi-supervised graph deep learning techniques usually rely on high-quality human-annotated labels,which are often expensive to acquire,or even impossible due to the concern of privacy.In this dissertation,to address the above challenges,we aim to design and implement several efficient,interpretable,and user-friendly methods to mine valuable knowledge and information from graph data.Specifically,our main contributions are summarized as the following four aspects.(1)We propose a new attributed graph clustering framework.Attribute graph clustering is to identify clusters that show both structural cohesiveness and attribute homogeneity.We note that existing methods ignore such an issue,i.e.,in an attribute graph,different clusters usually tend to correlate to different attribute dimensionalities.To this end,we define and optimize a weight vector that describes the correlation between clusters and attributes.This can facilitate the model to capture personalized correlation pattern between structure and attribute.Finally,we formulate the attribute graph clustering problem as a bi-objective optimization problem,and develop an efficient heuristic optimization algorithm.In optimization,the correlation weight vector can be updated synchronously or asynchronously.Theoretical analysis and experimental evaluation show that the framework has good effectiveness as well as high efficiency.(2)We propose a novel graph substructure assembling neural network.Many existing methods have achieved high performance in the task of graph classification.However,none of them can effectively identify discriminative substructures,which limits their interpretability.In this study,we aim to design a graph neural network that is able to not only achieve high classification performance but also identify task-specific discriminative substructure features,thereby improving the interpretability of the model.Considering that in graph data,the neighbors of nodes have no natural order,we further propose an attention-based sorting mechanism for automatically learning the order of neighbors.This helps the model achieve higher performance as well as lower variance.The experimental results show that the proposed method can achieve high classification performance,and can effectively discover discriminative substructure features,facilitating good model interpretability.(3)We propose an innovative heterogeneous graph convolutional neural network.In this study,we notice that most existing heterogeneous graph neural network methods have two limitations: 1)They need users to manually specify several useful task-specific meta-paths.This is a difficult task for users? 2)Before performing the graph convolution operation,they require additional and time-consuming pre-processing operations,which limit their model efficiency.To this end,we design an efficient network architecture,which has three key steps,i.e.,feature projection,object-level aggregation,and type-level aggregation.Theoretical analysis and experimental results show that the proposed method can automatically evaluate the importance of all possible meta-paths and identify useful meta-paths for a specific task.By exploiting the structural features conveyed by these useful meta-paths,the model can achieve high performance.Besides,the specific semantics conveyed by these meta-paths facilitate good interpretability of the model.(4)We propose a novel self-supervised heterogeneous graph pre-training framework.Traditional semi-supervised graph neural networks are usually trained under the guidance of supervision labels,while these labels are expensive to acquire.To alleviate this problem,researchers have recently proposed several methods to pre-train graph neural networks in a self-supervised manner.However,the performance of these methods usually relies on various specific strategies for generating positive and negative samples,limiting their flexibility and generalization ability.To address this issue,the proposed framework generates pseudo-labels through structural clustering on heterogeneous graphs,and uses the obtained pseudo-labels to guide the learning of heterogeneous graph neural networks.It does not need to generate any positive or negative samples.We transfer the learned representations to various downstream graph analytical tasks.The experimental results demonstrate that the proposed framework can achieve superior performance,even surpassing some traditional semi-supervised baselines.Finally,we conclude this dissertation,and discuss the possible future research directions for graph mining technology.
Keywords/Search Tags:Graph Mining, Attribute Graph, Heterogeneous Graph, Graph Neural Network, Graph Clustering, Graph Classification
PDF Full Text Request
Related items