Font Size: a A A

Bayesian Network Learning and Applications in Bioinformatics

Posted on:2013-06-30Degree:Ph.DType:Thesis
University:University of KansasCandidate:Lin, XiaotongFull Text:PDF
GTID:2458390008974941Subject:Bioinformatics
Abstract/Summary:
A Bayesian network (BN) is a compact graphic representation of the probabilistic relationships among a set of random variables. The advantages of the BN formalism include its rigorous mathematical basis, the characteristics of locality both in knowledge representation and during inference, and the innate way to deal with uncertainty. Over the past decades, BNs have gained increasing interests in many areas, including bioinformatics which studies the mathematical and computing approaches to understand biological processes.;In this thesis, I develop new methods for BN structure learning with applications to biological network reconstruction and assessment. The first application is to reconstruct the genetic regulatory network (GRN), where each gene is modeled as a node and an edge indicates a regulatory relationship between two genes. In this task, we are given time-series microarray gene expression measurements for tens of thousands of genes, which can be modeled as true gene expressions mixed with noise in data generation, variability of the underlying biological systems etc. We develop a novel BN structure learning algorithm for reconstructing GRNs.;The second application is to develop a BN method for protein-protein interaction (PPI) assessment. PPIs are the foundation of most biological mechanisms, and the knowledge on PPI provides one of the most valuable resources from which annotations of genes and proteins can be discovered. Experimentally, recently-developed high- throughput technologies have been carried out to reveal protein interactions in many organisms. However, high-throughput interaction data often contain a large number of iv spurious interactions. In this thesis, I develop a novel in silico model for PPI assessment. Our model is based on a BN that integrates heterogeneous data sources from different organisms.;The main contributions are:;1. A new concept to depict the dynamic dependence relationships among random variables, which widely exist in biological processes, such as the relationships among genes and genes' products in regulatory networks and signaling pathways. This concept leads to a novel algorithm for dynamic Bayesian network learning. We apply it to time-series microarray gene expression data, and discover some missing links in a well-known regulatory pathway. Those new causal relationships between genes have been found supportive evidences in literature.;2. Discovery and theoretical proof of an asymptotic property of K2 algorithm (a well-known efficient BN structure learning approach). This property has been used to identify Markov blankets (MB) in a Bayesian network, and further recover the BN structure. This hybrid algorithm is evaluated on a benchmark regulatory pathway, and obtains better results than some state-of-art Bayesian learning approaches.;3. A Bayesian network based integrative method which incorporates heterogeneous data sources from different organisms to predict protein-protein interactions (PPI) in a target organism. The framework is employed in human PPI prediction and in assessment of high-throughput PPI data. Furthermore, our experiments reveal some interesting biological results.;4. We introduce the learning of a TAN (Tree Augmented Naive Bayes) based network, which has the computational simplicity and robustness to high-throughput PPI assessment. The empirical results show that our method outperforms naive Bayes and a manual constructed Bayesian Network, additionally demonstrate sufficient information from model organisms can achieve high accuracy in PPI prediction.
Keywords/Search Tags:Bayesian network, PPI, BN structure learning, Relationships among, Organisms
Related items