Font Size: a A A

Bayesian methodologies for genomic data with missing covariate

Posted on:2009-11-17Degree:Ph.DType:Dissertation
University:University of FloridaCandidate:Li, ZhenFull Text:PDF
GTID:1440390002990598Subject:Biology
Abstract/Summary:
With advancing technology, large single nucleotide polymorphism (SNP) datasets are easily available. For the ADEPT 2 project, we have candidate SNPs and interesting phenotypic trait values available, while about 10% of the SNPs are missing.;Standard software packages cannot deal adequately with missing SNP data. For example, SAS either uses an available case analysis (which employs all the complete cases for the inference of target parameters) or the procedure MI (or MIANALYZE) where SAS assumes multivariate normal distributions for all the variables. Some software deletes the incomplete observations, which is generally unacceptable for datasets with many SNPs, because it can give biased estimates, or possibly delete all the data. More recently, single SNP association, linkage disequilibrium based imputation, and haplotype based imputation have been proposed.;I describe a Bayesian hierarchical model to explain the SNP effects for the phenotypic traits, and incorporated family structure information for the observations. For this association test, the information of the degree of linkage disequilibrium is not required and missing SNPs are imputed based on all the available information. We used a Gibbs sampler to estimate the parameters and prove that updating one SNP at each iteration still preserves the ergodic property of the Markov chain, and at the same time it improves computation speed.;We also ran a stochastic search algorithm to search the good subsets of variables or SNPs. Bayes factor is used as a model comparison criterion and a new Bayes factor approximation formula is proposed. The hybrid Metropolis-Hastings algorithm was used to search the good models in the model sample space and proven to have the ergodic convergence property. To improve the computation speed, first a matrix identity was applied to avoid direct calculation of matrix inversions and determinants, then we replaced the imputed missing SNPs with the average of imputed SNPs, which substantially increased the computation speed.
Keywords/Search Tags:SNP, Missing, Data, Snps, Computation speed, Available
Related items