Font Size: a A A

Stratification And Prognosis Of Cancer Based On Integrating Multiple Omics Data

Posted on:2022-05-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z Z HeFull Text:PDF
GTID:1484306602492674Subject:Computer application technology
Abstract/Summary:
Cancer is one of the important diseases threatening human health.In recent years,the incidence of cancer has shown an upward trend,and the mortality rate has also remained high.Precisely stratifying the same type of cancer patients into different subtypes that are significantly related to clinical results is beneficial to clinicians in the targeted diagnosis and treatment of patients,and it is an important means for promoting the realization of precision medicine.Meanwhile,accurately predicting the survival time of cancer patients is of great significance to guiding clinicians to formulate appropriate treatment plans.Traditionally,in clinical practice,the subtypes of cancer and the survival period of the patient are mainly judged based on clinical characteristics such as tumor size,grade and stage,etc.In fact,cancer is a complex disease with a high degree of molecular heterogeneity.With the rapid development of next-generation sequencing technology,cancer multi-omics data has been extensively accumulated.However,the types and dimensions of different omics data are different,and they have the characteristics of high dimensionality and redundancy.Therefore,how to effectively integrate multiple omics data to better help the stratification and prognosis of cancer patients is an urgent problem in the current cancer bioinformatics research field.Focusing on the above problems,this dissertation studies stratification and prognosis of cancer based on integrating multi-omics data.Specifically,the main contributions of the dissertation are outlined as below:(1)A method integrating somatic mutation and gene expression based on network is proposed to identify cancer subtypes.It has a better performance than other methods in identifying major cancer subtypes.In view of the specific characteristics of various cancers,a network-based method for identifying cancer subtypes introducing a cancer-specific network to indirectly integrate somatic mutation and gene expression is designed.In data integration,this method takes into account the specificity of cancer.First,a cancer-specific significant co-expression network(SCN)is constructed for each type of cancer using its gene expression data,and then the gene somatic mutation data are mapped onto the SCN network,propagated,and used for further clustering.In clustering,this method adopts an improved network-regularized nonnegative matrix factorization(net NMF)(net NMF_HC)for a more precise clustering.This method is applied to various datasets,including ovarian cancer(OV),lung adenocarcinoma(LUAD)and uterine corpus endometrial carcinoma(UCEC)cohorts.Based on the results,the proposed algorithm successfully identifies survival-related cancer subtypes,and for most cancer types in this study,this method outperforms the traditional Network-based Stratification(NBS)method in identifying informative cancer subtypes that are significantly associated with clinical outcomes.In particular,this method identifies survival-associated UCEC subtypes that are not identified by the NBS method.(2)A method that integrates gene expression and clinical variables based on their correlation is proposed for stratification of breast cancer.This method is more effective than other methods in identifying breast cancer subtypes.In view of the characteristic that breast cancer samples with different clinical phenotypes have different gene expression patterns,a method is designed to utilize maximum relevance minimum redundancy(m RMR)feature selection to indirectly integrate clinical variables into gene expression for breast cancer subtyping.The method mainly includes two stages:gene feature selection and sample clustering,in which data integration is in the gene feature selection stage.This method adopts m RMR to select the gene expressions that have the highest correlation with clinical variables and the least redundancy between themselves,and then utilizes K-means to cluster breast cancer samples based on these selected gene expressions.This method is compared with two commonly used only expression-based breast cancer stratification methods: prediction analysis of microarray 50(PAM50)and highest variability(HV).The result is that this method outperforms them in identifying subtypes significantly associated with five-year survival and recurrence time.Specifically,our method identifies recurrence-associated breast cancer subtypes that are not identified by PAM50 and HV.Additionally,this method discovers three survival-associated Luminal A subgroups and two survival-associated Luminal B subgroups.(3)A method integrating multi-omics data including somatic mutations based on multikernel learning is proposed for the prognosis of breast cancer.This method outperforms other methods in breast cancer prognosis.Aiming at the impact of somatic mutations on the prognosis of breast cancer and that the multi-omics data are high-dimensional and redundant,a method based on multiple kernel learning(MKL)to integrate multi-omics data including somatic mutations is designed for breast cancer prognosis.The method firstly adopts m RMR feature selection method to select features that have largest relevance with survival time and the smallest internal redundancy themselves for each type of data,and then utilizes MKL to effectively integrate somatic mutation and currently commonly used gene expression,copy number variation,methylation and protein expression for breast cancer prognosis.The experimental results show that,compared with other methods,this method has achieved higher breast cancer prognostic performance.By comparing different omics data models,it is found that integrating somatic mutations obviously improves the prognostic accuracy of breast cancer.By comparing different feature selection methods,m RMR is superior to other feature selection methods in this study.In addition,in the integration of multi-omics data,MKL has better prognostic performance than other traditional classifiers.
Keywords/Search Tags:Gene network, Cancer stratification, Correlation, Integration of multi-omics, Breast cancer prognosis
Related items