Font Size: a A A

Cancer Subtypes Identification Based On Multi-omics Graph Clustering

Posted on:2024-03-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:S G GeFull Text:PDF
GTID:1524307118484424Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Integrating information from multiple omics data to achieve accurate identification cancer subtypes can provide guidance and assistance for personalized treatment and clinical prognosis,which is one of the important topics in bioinformatics.One of the important characteristics of current omics data is that the number of samples is much smaller than the number of features.Multi-omics similarity-based methods only need to consider the similarity value between samples in the data integration step,so they have the advantage of lower computational complexity than methods that need to consider all features.Multi-omics similarity-based methods usually obtain clustering results by spectral clustering or graph segmentation methods based on spectral graph theory.Therefore,they is also called multi-omics graph clustering methods.Multi-omics graph clustering methods explore the potential underlying distribution of multiple omics data by mining the similarity relationship between patients,and have become one of the important technologies for predicting cancer subtypes based on machine learning.Although the existing multi-omics graph clustering methods have achieved good results in cancer subtypes identification,there are still several fundamental questions that remain unanswered: 1)In shallow learning methods,firstly,most of the existing methods do not consider the quality of the graph,and some unreliable graphs may lead to suboptimal clustering results.Second,how to fuse different omics into a single graph.More often than not,the similarity between samples may be manifested through different omics,and many existing methods either simply take the average of multiple omics to get a common graph,failing to consider the local manifold structure unique to all omics.Therefore,the rich heterogeneous information is not fully utilized.Finally,most of the existing methods do not consider how to integrate graph learning and clustering processes to obtain clustering results without additional clustering methods.2)In deep learning,most of the existing methods usually only consider the feature representation of the data,and pay little attention to the structure of the data.However,data structure can reflect the potential similarity relationship between samples,and can also provide important supplement and guidance for deep learning models.Aiming at at the above problems,this thesisdesigns three shallow learning method by using Laplacian rank constraint,smooth subspace clustering and latent representation learning,and designs a deep learning method by using graph convolutional network.These methods fully consider the complementarity of omics data,mines more accurate similar information between patients,improves clustering performance,and obtains more accurate cancer subtypes results.The main research contents are as follows:1.In order to improve the quality of the graph,fully consider the complementary information between omics and directly obtain the clustering structure,a cancer subtypes identification method based on Laplacian rank constrained multi-omics clustering was proposed.Firstly,without involving the original omics data,the affinity graphs which contain the same connected components for all omics data are generated by using the graph construction method with Laplacian rank constraint.Then,the adaptive graph fusion method is used to obtain the consensus graph containing multiple omics information,and the cluster structure is directly obtained.Finally,the graph construction,graph fusion and clustering processes were coupled into a unified framework,which enabled each module to update and learn from each other and improved the clustering performance.2.Aiming at the problem that local structure learning is difficult to reflect the global structure between samples and ignores the correlation between graph structure and spectral clustering,a cancer subtypes identification method based on self-adaptive multi-omics global similarity fusion was proposed.Firstly,the smooth subspace learning was used to replace the local similarity learning that needed to set the neighborhood,and the global similarity matrices of all omics data was obtained.Then,the self-adaptive graph fusion method based on Laplacian rank constraint is used to learn a unified global similarity graph,so as to establish the connection between graph structure and spectral clustering.At the same time,the embedded Eigengap method can dynamically find the optimal number of subtypes to directly complete the clustering task.Finally,the global similarity matrix construction,graph fusion,cluster number screening and clustering processes were integrated to obtain the best clustering results through joint optimization.3.Aiming at the problem that the graph construction process relies on the original features and ignores the noise and redundancy of high-dimensional data,a cancer subtypes identification method based on latent representation learning multi-omics spectral clustering was proposed.Firstly,the latent representation learning method is used to extract the low-dimensional latent representation from the high-dimensional representation of each omics data.Then,based on smooth subspace learning,these low-dimensional latent representations are used as input to generate the corresponding similarity relations.Finally,the fusion similarity matrix was obtained by using the self-weighted graph fusion method,and the clustering results were obtained by using spectral clustering.4.Aiming at the problem that traditional deep learning clustering methods ignore the structural characteristics of multi-omics data graphs,a cancer subtypes identification method based on self-supervised multi-omics graph convolutional network fusion was proposed.Firstly,the k-nearest neighbor and stacked autoencoder methods are used to learn the structural representation and feature representation of each omics data,respectively.Secondly,the graph convolutional network is used to jointly learn the structure representation and feature representation of omics data,and then the high-order structure representation is generated for each omics.Thirdly,two data fusion strategies,adaptive graph fusion and feature representation fusion,are proposed to obtain the fused structure representation and feature representation.Finally,the dual self-supervised method was used to learn the clustering structure,and the end-to-end training of the whole model was realized.In this thesis,in order to describe the effectiveness of the above four methods in cancer subtypes identification,multi-omics data sets of ten cancer types provided by the TCGA database were used for comparative experiments,and the obtained clustering results were verified by Cox log-rank test and clinical enrichment analysis.Then,the differences of the identified cancer subtypes at the molecular level were further analyzed by screening differentially expressed genes and GO enrichment tests.The experimental and analytical results show that the proposed multi-omics clustering algorithm has good performance in solving the problem of cancer subtypes,and some cancer subtypes with biological significance are found through the case analysis of AML,BIC and GBM.There are 28 figures,23 tables,and 190 references in this dissertation.
Keywords/Search Tags:Cancer subtypes, Multi-omics graph clustering, Machine learning, Graph fusion, Omics data
PDF Full Text Request
Related items