Font Size: a A A

Large-scale gene expression data analysis and management

Posted on:2003-07-04Degree:Ph.DType:Thesis
University:The Pennsylvania State UniversityCandidate:Mitra, MadhusmitaFull Text:PDF
GTID:2460390011984670Subject:Biology
Abstract/Summary:
In the post-sequencing era the amount of data being made available to researchers is increasing exponentially. Translating the data to information, data mining, is currently the rate-limiting step and a worthwhile pursuit. The two important components of data mining are (a) data analysis and (b) data management. My PhD thesis has focused on both of these areas in the context of large-scale gene expression data. Data Analysis. We have analyzed previously published sets of DNA microarray gene expression data by singular value decomposition to uncover underlying patterns or “characteristic modes” in their temporal profiles. These patterns contribute unequally to the structure of the expression profiles. Moreover, the essential features of a given set of expression profiles are captured using just a small number of characteristic modes. This leads to the striking conclusion that the transcriptional response of a genome is orchestrated in a few fundamental patterns of gene expression change. These patterns are both simple and robust, dominating the alterations in expression of genes throughout the genome. Moreover, the characteristic modes of gene expression change in response to environmental perturbations are similar in such distant organisms as yeast and human cells. This analysis reveals simple regularities in the seemingly complex transcriptional transitions of diverse cells to new states, and these provide insights into the operation of the underlying genetic networks. Data Management. We have designed and implemented a microarray database, StressDB, for management of microarray data from our studies on stress-modulated genes in Arabidopsis. StressDB provides small user groups with a locally installable web-based relational microarray database. It has a simple and intuitive architecture and has been designed for cDNA microarray technology users. StressDB uses Windows™ 2000 as the centralized database server with Oracle™ 8i as the relational database management system. It allows users to load, query and analyze microarray data and data-related biological information over the Internet using a web browser. The source-code is currently available on request from the authors and will soon be made freely available for downloading from our website at http://arastressdb.cac.psu.edu.
Keywords/Search Tags:Data, Gene expression, Available, Management
Related items