Font Size: a A A

Methods for integrating and comparing coexpression information over multiple data sets and applications in mice aging

Posted on:2010-12-07Degree:Ph.DType:Thesis
University:Stanford UniversityCandidate:Southworth, Lucinda KayFull Text:PDF
GTID:2448390002476885Subject:Biology
Abstract/Summary:
In this thesis, I present my results from two different research projects, both concerning the coexpression of genes across multiple microarray inputs. First, I present a differential coexpression analysis method to find significant differences in similar microarray data sets. Although I applied this method specifically to aging, it can be used to analyze any pair of closely related coexpression data sets. To date, differential expression analysis has been the foundation for the comparison of microarray data between two conditions. Although such research has yielded many interesting biological insights, traditional differential expression analysis is limited in that it can only study genes individually, ignoring potentially interesting changes in the interactions between genes. I have shown that this subtlety is especially important when studying aging, a complex process involving the cumulative effects of many different genetic pathways in various tissues.;Using differential coexpression analysis, I showed that there appear to be large-scale changes in transcription in old mice. For example, upon comparing coexpression networks for 16- and 24-month-old mice, I found that there is an overall loss of correlation with age. In addition, I found that the loss is not uniform, but rather modular: Groups of coexpressed genes lose correlation together.;To find the groups of coexpressed genes that change with age, I developed the difference network framework. Consistent with the modular coexpression loss mentioned above, I discovered that many more gene groups decrease correlation than increase. In addition to looking at many Gene Ontology categories and coexpressed gene groups, I also introduced a way to cluster genes directly on the difference network to find groups of genes whose correlation changes with age.;Because the difference network framework quantifies a gene's propensity for coexpression loss, it allowed me to query various transcriptional regulatory mechanisms for their role in the observed differences. I located a number of transcription factors whose computationally predicted targets decrease correlation with age. Of these, the NF-kappaB transcription factor protein seems the most promising, as its activity has been shown to change with age. Furthermore, I found that genes that tend to lose correlation with age show locational clustering on the chromosome, hinting at a role for chromatin domains in age-regulated changes in transcription.;The second research direction I undertook involved using the ProbGR algorithm to locate genes that are coexpressed with a query set of genes in a relevant subset of available microarray data. ProbGR uses an iterative Bayes approach to select experiments where the query set is optimally coexpressed. In turn, these experiments are used to find other genes that are relevant to the query. As an example, I used the ProbGR to find previously unknown functional relationships between genes involved in the iron transport pathway in a subset of 753 yeast microarray experiments. ProbGR is not restricted to microarray data; rather, it is applicable to any large high-throughput data set. Applying ProbGR to chIP-chip data, I found that telomerically located genes that are involved in DNA-dependent ATPase activity are bound by the same transcription factor, CIN5. Finally, I extended ProbGR to identify joint relationships across multiple different data types.
Keywords/Search Tags:Data, Coexpression, Multiple, Genes, Different, Probgr, Correlation with age, Transcription
Related items