Font Size: a A A

Informed structural priors for Bayesian networks: Applications in molecular biology using heterogeneous data sources

Posted on:2007-01-27Degree:Ph.DType:Thesis
University:Brown UniversityCandidate:Leach, Sonia MFull Text:PDF
GTID:2448390005960267Subject:Biology
Abstract/Summary:
The main goal of this thesis is to investigate ways in which a collection of separate prior information sources, of varying scale and reliability, can be integrated in a principled manner to reveal a more complete understanding of a problem domain. We consider a case study from the biological domain: inferring a network of gene interactions. Biology has a rich and diverse body of knowledge already available upon which to leverage the task of learning such a model; the challenge is in developing meaningful ways to encode such prior knowledge which take into account the heterogeneity of the individual prior data sources. We consider a host of 'experts' which specify explicit or implicit interactions between genes, such as physical interaction between the genes, shared biological function or even co-occurrence of literature references. Though each individual source may be a poor indicator for gene interaction when used alone, we demonstrate techniques for combining the experts to formulate a consensus belief or likelihood that any two genes are related. The resulting probability distributions of gene relationships are then used in two distinct ways. The first is in providing structural priors for learning Bayesian Networks. We show how using a prior distribution over interactions between genes can significantly increase the speed and quality of search for high scoring Bayesian Networks when learning from gene expression data. Our studies make use of simulated data from a model of ICU ventilator management (ALARM), a benchmark for Bayesian Network learning, as well as real-world biological data from the Yeast genome. The second application uses the consensus priors over gene relationships to create visualization tools for large scale datasets. We provide results for examples in both Yeast and Mouse which demonstrate how a collection of weakly suggested or sometimes unreliable relationships can be combined to create a powerful and useful working model of interactions, allowing biologists to better understand an overwhelming amount of information.
Keywords/Search Tags:Prior, Bayesian networks, Data, Gene, Interactions
Related items