Font Size: a A A

Discovering functionally coherent gene sets using heterogeneous information sources: A graphical framework

Posted on:2012-02-20Degree:Ph.DType:Dissertation
University:Medical University of South CarolinaCandidate:Richards, Adam JFull Text:PDF
GTID:1458390011457561Subject:Biology
Abstract/Summary:
In this era of emerging new technologies, where data collection often outpaces the development of analytical tools, data management and integration have become crucial areas of research for the advancement of numerous fields in science and engineering. In the biomedical sciences, the need to integrate heterogeneous data sources is particularly evident; take for example the myriad of applications in the areas of systems biology, biomarker discovery and high throughput data analysis. This work focuses on finding the handful of needles in the proverbial haystack, by making use of as many tools as possible. The needles are the genes of interest and the tools are the available data sources.;Many laboratory experiments carried out in biomedicine produce results in the form of gene/protein set(s) either directly or as a secondary result following an initial data analysis. For the biologist, these gene lists can be large and unwieldy, which makes it difficult to focus in on the most important subset(s) of the results. Information used to study gene sets comes in forms including: genomic sequence, established database, and derived data such as those from statistical models. Because no single data source sufficiently represents all possible aspects of biology it is of interest to integrate these data.;Biological experiments---more often than not---contain noise, and given the problem of experimental noise, multiple data sources also serve to augment our degree of belief in a particular conclusion. Recognizing this and drawing on recent developments in graph theory, we implemented a strategy for data integration that involves kernel transforming the data and projecting it into an eigen-decomposition subspace with the goal of mining the constituent gees for interesting subsets. Ultimately, this work resulted in three distinct projects: one that aims to help computational biologists manage and integrate large amounts of data using; a second that when provided with an arbitrary gene set uses a graph-theoretic means to assess and measure functional coherence; and finally a third that mines gene sets for biologically interesting subsets, based on information from a heterogeneous of data sources.
Keywords/Search Tags:Data, Gene sets, Sources, Information
Related items