Font Size: a A A

Causal modeling in quantitative genomics

Posted on:2009-05-06Degree:Ph.DType:Dissertation
University:University of WashingtonCandidate:Chen, LinFull Text:PDF
GTID:1445390002992836Subject:Biology
Abstract/Summary:
An ultimate goal of genomic research is to understand the underlying mechanisms of living organisms. This includes understanding how features in the genome (e.g., DNAs, genes, mRNAs, proteins, metabolites) regulate and interact with each other, and how they further interact with environment and affect the physical characteristics of living organisms. To address these questions, the emergence of high-throughput biological technologies produces large amounts of data, providing us the opportunity to systematically explore the causal regulatory pattern in the genome. The typical high-dimensionality of genomic data also brings new challenges to the traditional analytic approaches in causal inference and modeling. The large number of predictors versus small sample size, the challenge of conservative but informative significance measures, the difficulty of distinguishing causality from correlation, and the consideration of hidden variables are insufficiently addressed in existing methods of causal modeling in quantitative genomics studies. In this dissertation, we present a rigorous statistical framework using integrative quantitative genomics for inferring causal regulatory relationship among genes, gene products, and traits. This framework is based on the highly successful concept of randomization, which has been used as a gold standard for inferring causality. Specifically, the random inheritance of genotypes at any particular locus during meiosis induces at least partial randomization in the transcriptional levels of genes it affects. This naturally occurring randomization process can be used to infer causality on expression traits or from expression traits to high-order traits. We build causal models and search for the causal regulatory relationships among pair-wise transcription levels not affected by any hidden variable. With genetics of gene expression data, where both genotypes and transcription expressions are measured on each offspring from a random cross, we propose a non-parametric empirical Bayesian method to estimate the posterior regulatory probability for any pair of gene transcripts in the genome and further build directed transcriptional regulatory networks in a bottom-up fashion. The proposed idea has general applicability and can be extended to infer causality from transcripts to higher-order traits of organisms. We argue that by borrowing information across genes, we are able to model the effect of hidden variables with principal component analysis. We further extend the theoretical framework to accommodate the possible effects of hidden variables. We also propose all algorithm to identify the genes whose transcription levels are causal for a quantitative trait with a p-value calculation.
Keywords/Search Tags:Causal, Quantitative, Genes, Modeling
Related items