Font Size: a A A

Bayesian Multi-Locus Model In High Dimensional Omics Data

Posted on:2019-04-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:W W DuanFull Text:PDF
GTID:1314330545985419Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
The once flourishing genome-wide association studies(GWAS)have identified a large amount of single-nucleotide polymorphisms(SNPs)associated with human complex traits.However,for most traits,the discovered SNP can explain only a small fraction of the heritability.One possible explanation of this so-called “missing heritability” is that the usually used single variant regression is hard to detect weak effect variants.Multi-locus regression has high power but with a challenge in large scale high-dimensional omics data.In Section 1,Bayesian multi-locus model using variational inference(BAL-VI)is proposed with an adaptive conditional Laplace prior.This prior can make adaptive penalty on variant coefficients while variational inference(VI)accomplish the aim of fast and accurate posterior computation in large scale data.In simulation trials,BALVI is evaluated in aspects of variable selection,parameter estimation,and outcome prediction.At the same time,we comment on some important issues including the width of interval estimation,the difference of fitting between VI and MCMC,as well as sensitivity analysis with regard to hyper-parameter.Subsequently,we apply this model to a lung cancer GWAS data.The numerical studies show that: 1)the narrow 95% Bayesian credible interval of BAL-VI yields high false positive rate,which can be well controlled by an extra heritability threshold;2)the overall results indicate best performance of BAL-VI;3)BAL-VI can avoid the attenuation of true effect when the causal variant is located in strong linkage disequilibrium region,which is benefit to identify the causal variant or region;4)the fitting of BAL-VI is sensitive to hyper-parameter.The real data analysis presents that: 5)BAL-VI can be applied to genome-wide data,and find multiple variants be associated with lung cancer risk;6)the model has a remarkable advantage in fast computation,for example,BAL-VI fits the data in about half day while MCMC model needs 5 days.In Section 2,the linear EMVS model is generalized to Weibull parametric survival model(i.e.SurvEMVS).The model impose a continuous spike-and-slab prior on variant effect,which facilitate variable selection.An EM algorithm is employed for fast posterior exploration and parameter estimation.Owing to unavailable closedform solution,a variant of cyclic coordinate descent(CCD)is nested in the EM for fast updating effect estimator.Extended Bayesian information criterion(EBIC)is used to make decision on hyper-parameter tuning.In numeric studies,SurvEMVS is appreciated in terms of variable selection,parameter estimation,and survival prediction.Moreover,we consider the impact of EBIC with various ? on model fitting,and explore the scenario that goes against the assumption of Weibull distribution.Real data analyses include a GWAS data of lung cancer and a gene expression data of stomach cancer.The numerical studies show that: 1)the comprehensive results indicate best performance of SurvEMVS with ?=0.5;2)the final model is much sparser with ? increasing;3)SurvEMVS is robust if the real survival distribution moderately violates the Weibull assumption.The real data analyses indicate that: 4)Several variants identified by SurvEMVS may have influence on cancer prognosis,and some of them are successfully validated by external dataset;all these manifest that our model can be applied to varied omics data;5)The EM algorithm of our model presents a rapid convergence performance.In Section 3,we summarize these two studies and discuss the application prospects of Bayesian statistics in omics data.
Keywords/Search Tags:Bayesian multi-locus model, genomics, transcriptomics, high-dimensional data, parametric survival model, variational inference, EM algorithm, spike-and-slab prior, lung cancer, stomach cancer
PDF Full Text Request
Related items