Cancer Subtypes Identification Based On Multi-omics Graph Clustering

Posted on:2024-03-23

Degree:Doctor

Type:Dissertation

Country:China

Candidate:S G Ge

Full Text:PDF

GTID:1524307118484424

Subject:Control theory and control engineering

Abstract/Summary:

PDF Full Text Request

Integrating information from multiple omics data to achieve accurate identification cancer subtypes can provide guidance and assistance for personalized treatment and clinical prognosis,which is one of the important topics in bioinformatics.One of the important characteristics of current omics data is that the number of samples is much smaller than the number of features.Multi-omics similarity-based methods only need to consider the similarity value between samples in the data integration step,so they have the advantage of lower computational complexity than methods that need to consider all features.Multi-omics similarity-based methods usually obtain clustering results by spectral clustering or graph segmentation methods based on spectral graph theory.Therefore,they is also called multi-omics graph clustering methods.Multi-omics graph clustering methods explore the potential underlying distribution of multiple omics data by mining the similarity relationship between patients,and have become one of the important technologies for predicting cancer subtypes based on machine learning.Although the existing multi-omics graph clustering methods have achieved good results in cancer subtypes identification,there are still several fundamental questions that remain unanswered: 1)In shallow learning methods,firstly,most of the existing methods do not consider the quality of the graph,and some unreliable graphs may lead to suboptimal clustering results.Second,how to fuse different omics into a single graph.More often than not,the similarity between samples may be manifested through different omics,and many existing methods either simply take the average of multiple omics to get a common graph,failing to consider the local manifold structure unique to all omics.Therefore,the rich heterogeneous information is not fully utilized.Finally,most of the existing methods do not consider how to integrate graph learning and clustering processes to obtain clustering results without additional clustering methods.2)In deep learning,most of the existing methods usually only consider the feature representation of the data,and pay little attention to the structure of the data.However,data structure can reflect the potential similarity relationship between samples,and can also provide important supplement and guidance for deep learning models.Aiming at at the above problems,this thesisdesigns three shallow learning method by using Laplacian rank constraint,smooth subspace clustering and latent representation learning,and designs a deep learning method by using graph convolutional network.These methods fully consider the complementarity of omics data,mines more accurate similar information between patients,improves clustering performance,and obtains more accurate cancer subtypes results.The main research contents are as follows:1.In order to improve the quality of the graph,fully consider the complementary information between omics and directly obtain the clustering structure,a cancer subtypes identification method based on Laplacian rank constrained multi-omics clustering was proposed.Firstly,without involving the original omics data,the affinity graphs which contain the same connected components for all omics data are generated by using the graph construction method with Laplacian rank constraint.Then,the adaptive graph fusion method is used to obtain the consensus graph containing multiple omics information,and the cluster structure is directly obtained.Finally,the graph construction,graph fusion and clustering processes were coupled into a unified framework,which enabled each module to update and learn from each other and improved the clustering performance.2.Aiming at the problem that local structure learning is difficult to reflect the global structure between samples and ignores the correlation between graph structure and spectral clustering,a cancer subtypes identification method based on self-adaptive multi-omics global similarity fusion was proposed.Firstly,the smooth subspace learning was used to replace the local similarity learning that needed to set the neighborhood,and the global similarity matrices of all omics data was obtained.Then,the self-adaptive graph fusion method based on Laplacian rank constraint is used to learn a unified global similarity graph,so as to establish the connection between graph structure and spectral clustering.At the same time,the embedded Eigengap method can dynamically find the optimal number of subtypes to directly complete the clustering task.Finally,the global similarity matrix construction,graph fusion,cluster number screening and clustering processes were integrated to obtain the best clustering results through joint optimization.3.Aiming at the problem that the graph construction process relies on the original features and ignores the noise and redundancy of high-dimensional data,a cancer subtypes identification method based on latent representation learning multi-omics spectral clustering was proposed.Firstly,the latent representation learning method is used to extract the low-dimensional latent representation from the high-dimensional representation of each omics data.Then,based on smooth subspace learning,these low-dimensional latent representations are used as input to generate the corresponding similarity relations.Finally,the fusion similarity matrix was obtained by using the self-weighted graph fusion method,and the clustering results were obtained by using spectral clustering.4.Aiming at the problem that traditional deep learning clustering methods ignore the structural characteristics of multi-omics data graphs,a cancer subtypes identification method based on self-supervised multi-omics graph convolutional network fusion was proposed.Firstly,the k-nearest neighbor and stacked autoencoder methods are used to learn the structural representation and feature representation of each omics data,respectively.Secondly,the graph convolutional network is used to jointly learn the structure representation and feature representation of omics data,and then the high-order structure representation is generated for each omics.Thirdly,two data fusion strategies,adaptive graph fusion and feature representation fusion,are proposed to obtain the fused structure representation and feature representation.Finally,the dual self-supervised method was used to learn the clustering structure,and the end-to-end training of the whole model was realized.In this thesis,in order to describe the effectiveness of the above four methods in cancer subtypes identification,multi-omics data sets of ten cancer types provided by the TCGA database were used for comparative experiments,and the obtained clustering results were verified by Cox log-rank test and clinical enrichment analysis.Then,the differences of the identified cancer subtypes at the molecular level were further analyzed by screening differentially expressed genes and GO enrichment tests.The experimental and analytical results show that the proposed multi-omics clustering algorithm has good performance in solving the problem of cancer subtypes,and some cancer subtypes with biological significance are found through the case analysis of AML,BIC and GBM.There are 28 figures,23 tables,and 190 references in this dissertation.

Keywords/Search Tags:

Cancer subtypes, Multi-omics graph clustering, Machine learning, Graph fusion, Omics data

PDF Full Text Request

Related items

1	Research On Application Of Multi-omics Data Fusion Algorithm Based On Heterogeneous Graph Neural Network In Tumor Classification
2	Research On Cancer Classification Based On Deep Fusion Of Multi-omics Data
3	Research On Breast Cancer Subtypes Clustering Model Based On Multi-omics Data Fusion
4	Intelligent Recognition Of Tumor Subtypes Based On Multi-omics Data
5	Research On The Pathogenic Factors Of Kidney Cancer Based On Multi-omics Data And Machine Learning
6	Research Of Prognostic Carcinoma Molecular Subtypes Based On Omics Data
7	Cancer Driver Gene Identification Algorithm Based On Integrated Analysis Of Multi-omics Data And Network Models
8	Research On Cancer Driver Gene Identification Based On Multi-Omics Data And Graph Neural Network
9	Research On Analysis Of Cancer Subtypes Based On Multi-omics Data
10	Application Of Block Forest In Integrating Clinical And Omics Data To Construct Prognostic Models For Cancer Patients