Font Size: a A A

Discovery of complex pathways from observational data

Posted on:2011-04-15Degree:Ph.DType:Dissertation
University:University of Southern CaliforniaCandidate:Baurley, James WilliamFull Text:PDF
GTID:1448390002460057Subject:Biology
Abstract/Summary:
The etiology of complex diseases may involve a network of biological interactions, genetic and environmental. With the availability of high-throughput genotyping platforms, epidemiologists can thoroughly evaluate the genetic component of complex diseases. While seemingly straightforward when the unit of analyses is a single variant, comprehensive analysis of pathways is fundamentally more involved. The objective of pathway-based approaches is ultimately to uncover associations that have a biological pathway context, often undetectable by a "single variable at a time" perspective. Recently, there has been growing recognition that analysis methods that focus on pathways are needed to improve detection of interactions.;I introduce two pathway-based frameworks aimed at discovery of complex pathways from observational data. Both approaches account for pathway uncertainty by basing inference on the posterior distribution of models. They also allow for external pathway knowledge to be incorporated as priors on pathway parameters and structure or to enhance algorithm performance.;The Algorithm for Learning Pathway Structure (ALPS) discovers plausible pathways from observational data, and estimates both the net effect of the pathway and the relationships (interactions) among genetic or environmental risk factors. In this framework, a topology links combinations of observed variables through intermediate nodes (representing interactions) to a disease outcome. Biologic knowledge can be readily applied as a "prior topology" to give preference to more biologically plausible models. I demonstrate that ALPS can correctly identify the true risk factors and interactions across various simulated pathway configurations.;As the number of genetic variants increases to the scale of modern candidate gene studies and genome-wide association studies (GWAS), the space of models grows extremely large. The second framework introduced is a Bayesian model selection algorithm (known as PEAK) where parallel MCMC chains are utilized to tune the proposal density to better approximate the target density (i.e. the posterior). PEAK organizes the model space into subspaces linked through a graph derived from an ontology or domain expert. I demonstrate the flexibility and efficiency of the framework by running PEAK on various simulated graph structures (informative, uninformative) and causal models.;ALPS and PEAK were applied to real data in a pathway analysis of oxidative stress genes in a GWAS of asthma. By considering multivariate models with interactions, these methods uncovered several associations with strong Bayes factors missed by a traditional marginal scans. ALPS and PEAK provide a valuable toolkit for pathway-based investigations of complex diseases.
Keywords/Search Tags:Complex, Pathway, PEAK, ALPS, Interactions, Data, Genetic
Related items