| The global burden of cancer incidence and mortality is rapidly growing in recent years,but a lack of knowledge about the causes of cancer development,particularly in the early stages,has hampered the invention and updating of cancer therapies.Since cancer is exceedingly heterogeneous,therapy sensitivity varies widely from patient to patient.The collection of biological data and the development of bioinformatics tools have allowed for a better understanding of tumor biology,and as a result,personalized medicine has emerged as a fast changing therapeutic strategy.Utilizing highly individual-specific genomic data to understand genetic interactions in cancer development remains a challenge,with significant implications for both individual biomarker discovery and personalized medicine.Graph neural networks are commonly used for analyzing biomolecular networks.Nevertheless,many neural networks are limited to black box models that can only generate predictions and frequently struggle to offer reliable biological and clinical insights.This research focuses on the application of graph neural networks to analyze cancer patients’ individual-specific gene networks,accomplish stage-wise classification of cancer patients,and find gene clusters that play an essential role in classification.First,a novel end-to-end hierarchical graph neural network architecture is proposed for sample-specific networks to classify early and late-stage cancer samples.A partial correlation-based single-sample network is introduced to convert patients’ gene expression data into graphical data as the input to the graph neural network.The proposed sparse graph pooling operator in this paper can generate clusters with partially overlapping nodes and learn structural features at multiple scales in a hierarchical graph neural network,avoiding the problem of dense graphs with complete overlap in the standard differentiable graph pooling operator.Then,the interpretability of the cancer staging classification model provides a feasible supervised solution to the unsupervised problem of discovering key gene clusters.There is an additional differentiable soft mask layer utilized to identify the crucial gene clusters in the classification.In addition,this study proposes a perturbation strategy in which perturbations caused by subgraph deductions from the input graph are utilized to assess key gene clusters and subsequently group samples to classes to generate stage level interpretations.Experiments on four real-world gene expression datasets from The Cancer Genome Atlas show that the proposed model is not only comparable to advanced graph neural network and graph pooling operator methods on the cancer staging,but it is also capable of identifying key gene clusters that have a significant impact on the classification.And experiments on graph reconstruction using perturbation strategies reveal that deleting key subgraphs found by the network diminishes the classifier’s confidence in correct classification and identifies potential targets for personalized medicine through biological function research. |