Font Size: a A A

Research On Clustering Enhancement For Cancer Subtyping Based On Genomic Data

Posted on:2022-06-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:X DuanFull Text:PDF
GTID:1524306908988379Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Cancer refers to the disease that caused by abnormal cell differentiation and proliferation.Many major malignances are heterogeneous and comprise of multiple molecular subtypes underlying different clinical characteristics.Traditionally,cancer diagnosis,prognosis and treatment decision are still largely based on histopathological and clinical characteristics,such as,tumor size,grade,stages.This strategy demonstrates some prognostic values,but has poor predictive performance of drug efficacy due to the lack of clear molecular basis.With the development of sequencing technology,unsupervised classification based on whole transcriptome gene expression profiling has been widely used to dissect the cancer heterogeneity.However,gene expression profiling displays high dimensionality and has relatively small sample size,as well as containing feature redundancy,which bring challenge to the clustering methods.Accurate definition of the cancer molecular subtypes will greatly contribute to personalized treatment of patients and the design of targeted drugs.In this dissertation,we aimed at dissecting cancer molecular heterogeneity using novel clustering enhancement approach based on cancer genomics data.First,to solve the challenge of high dimensionality of gene expression profiling,we proposed ELM-CC,a cancer classification framework based on extreme learning machine.ELM-CC mapped the gene expression into the hidden layer of ELM auto-encoder,followed by the clustering on the hidden features to identify cancer subtypes and classification using independent dataset.To demonstrate the effectiveness of ELM-CC,we applied it to molecular subtyping on gastric cancer,ovarian cancer,medulloblastoma and large B-cell lymphoma.Compared with the commonly used clustering methods,ELM-CC shows better clustering performance and molecular subtypes identified are more clinically relevant.Secondly,most clustering methods often take as input the original gene expression and cannot effectively eliminate the information redundancy.We applied self-diffusion on local scaling affinity(LSSD)to facilitate the similarity learning of gene expression profiling.LSSD first construct local scaling affinity of patient-to-patient distance,followed by iteratively selfdiffusion process to improve the similarity learning.The diffused graph can largely improve the effectiveness of downstream spectral clustering analysis.We applied LSSD to gastric cancer and ovarian cancer molecular subtyping.The results demonstrated that LSSD achieved much better performance than the commonly used clustering methods and cancer molecular subtypes identified by LSSD showed strong biological and clinical relevance.Thirdly,most strategies for cancer molecular subtyping are based on single transcriptome,especially gene expression profiling.However,cancer molecular heterogeneity can also be presented at the genetic or epigenetic level,such as miRNA expression and DNA methylation.Relying on single transcriptome data may not be able to identify molecular heterogeneity at other genomic levels.In order to analyze tumor heterogeneity comprehensively,multi-omics data integration provides an effective solution.Based on classical method--similarity network fusion(SNF),we proposed local scaling similarity network fusion(Ls-SNF)to explore the cancer molecular subtyping.Compared with SNF,Ls-SNF requires less parameter settings and can solve the problem of unbalance data scales.We applied Ls-SNF to integrate three molecular omics data(gene expression,miRNA expression,DNA methylation)for the molecular subtyping of colorectal cancer.In addition,Ls-SNF combing with self-diffusion process showed better performance on breast cancer molecular subtyping.Finally,tumor heterogeneity also exists among tumor cells(intra-tumor heterogeneity).The emerging technology of single-cell sequencing(sc RNA-seq)provides an attractive tool to dissect celluar heterogeneity.However,single-cell RNA sequencing data display high dimensionality as well as containing plentiful of zero observations and noises,which bring great challenges to the existing unsupervised single-cell clustering methods.To overcome these challenges,we applied self-diffusion on local scaling affinity model to dissect the tumor celluar heterogeneity.LSSD first construct local scaling affinity to measure cell similarities,and then performs iterative self-diffusion process on the cell-cell distance to enhance similarities,followed by spectral clustering to identify cells types.The effectiveness of LSSD was evaluated on simulation dataset and two real single cell datasets.We then applied it on identifying the cell types of colorectal cancer followed by the cells annotation and pathways analysis and found the higher gene expression of fibroblast cells associated with poor survival.
Keywords/Search Tags:Cancer molecular heterogeneity, Extreme learning machine, Self-diffusion map, Multi-omics data integration, Tumor cell type identification
PDF Full Text Request
Related items