Font Size: a A A

Probabilistic And Statistical Methods In Cancer Genome Analysis

Posted on:2021-09-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:W Q HuangFull Text:PDF
GTID:1480306542996549Subject:Mathematics
Abstract/Summary:PDF Full Text Request
Cancer is a complex disease caused by somatic mutation.It is important to identify driver mutations in cancer genome analysis,which is conducive to the development of cancer precision medicine.The rapid development of high-throughput sequencing technology and somatic mutation-calling algorithms has produced large-scale cancer somatic mutation data,such as the TCGA(The Cancer Genome Atlas)somatic mutation data.However,only a few somatic mutations are causal,and a large number of mutations are passenger mutations randomly generated due to genomic instability in cancer genomes.Distinguishing driver mutations from passenger mutations is a great challenge for cancer genome analysis.Currently,researchers have developed a variety of computational methods to identify driver mutations and the main idea underlying these methods is to identify genomic regions that harbor significantly more non-silent mutations than expected by chance.However,the existing methods have not fully accommodated the heterogeneity of somatic mutations,resulting in insufficient statistical power to distinguish driver mutations from passenger mutations.Meanwhile,comprehensive tools to associate somatic mutation data with clinical data are lacking,which are essential for the development of individualized treatment strategies for cancer patients.The key to effectively identify driver mutations is to build an accurate background mutation rate model.In this paper,we propose a Poisson linear mixed effect model(Pro BMR)to model the background mutation rate of somatic mutations.Pro BMR estimates background mutation rates by utilizing genomic covariates information.Compared to existing background mutation rate models,such as the non-parametric model used in Mut Sig CV,Pro BMR borrows information across cancer genomes in a data adaptive manner and elaborately explains the heterogeneity of cancer somatic mutations.The analysis of significantly mutated genes and mutually exclusive gene pairs are two important research topics in cancer genome analysis.Driver signals are confounded by background mutations.As a result,there is limited power to identify driver mutations,and false positive findings may arise.To address the limitations,we developed the SMGCT algorithm and MEHAT algorithm based on the Pro BMR model.The SMGCT algorithm is a convolution test to identify driver genes by accurately accounting for background mutation rates.In simulaitons and in real data analysis,the SMGCT algorithm has a higher statistical power than existing approaches,and detects driver genes with less recurrent mutations.The MEHAT algorithm is a statistical method based on the Poisson-Multinomial distribution to identify mutually exclusive gene pairs.The MEHAT algorithm considers the asymmetric distribution of mutually exclusive features and successfully reduces false positive discoveries caused by frequently mutated genes.Except for a few highly recurrent mutations,we currently lack direct investigation of the association between somatic mutations and overall survival of cancer patients.To address the limitations,we propose the SPA-Cancer algorithm based on the Pro BMR model.The SPA-Cancer algorithm weighs and aggregates somatic mutations in a pathway and stably identifies pathways that are associated with patient survival.It is noteworthy that these stable associated pathways are known to play important roles in cancer biology.
Keywords/Search Tags:somatic mutation, background mutation rate, driver gene, mutually exclusive gene pair, stable associated pathway
PDF Full Text Request
Related items