Font Size: a A A

A probabilistic approach to data integration in biomedical research: The IsBIG experiments

Posted on:2011-10-27Degree:Ph.DType:Dissertation
University:Indiana UniversityCandidate:Anand, VibhaFull Text:PDF
GTID:1448390002960466Subject:Health Sciences
Abstract/Summary:
Biomedical research has produced vast amounts of new information in the last decade but has been slow to find its use in clinical applications. Data from disparate sources such as genetic studies and summary data from published literature have been amassed, but there is a significant gap, primarily due to a lack of normative methods, in combining such information for inference and knowledge discovery.;In this research using Bayesian Networks (BN), a probabilistic framework is built to address this gap. BN are a relatively new method of representing uncertain relationships among variables using probabilities and graph theory. Despite their computational complexity of inference, BN represent domain knowledge concisely. In this work, strategies using BN have been developed to incorporate a range of available information from both raw data sources and statistical and summary measures in a coherent framework. As an example of this framework, a prototype model (In-silico Bayesian Integration of GWAS or IsBIG) has been developed. IsBIG integrates summary and statistical measures from the NIH catalog of genome wide association studies (GWAS) and the database of human genome variations from the international HapMap project. IsBIG produces a map of disease to disease associations as inferred by genetic linkages in the population.;Quantitative evaluation of the IsBIG model shows correlation with empiric results from our Electronic Medical Record (EMR) -- The Regenstrief Medical Record System (RMRS). Only a small fraction of disease to disease associations in the population can be explained by the linking of a genetic variation to a disease association as studied in the GWAS. None the less, the model appears to have found novel associations among some diseases that are not described in the literature but are confirmed in our EMR. Thus, in conclusion, our results demonstrate the potential use of a probabilistic modeling approach for combining data from disparate sources for inference and knowledge discovery purposes in biomedical research.
Keywords/Search Tags:Data, Probabilistic, Isbig
Related items