Font Size: a A A

Multivariate analysis of gene expression data and functional information: Automated methods for functional genomics

Posted on:2006-07-11Degree:Ph.DType:Dissertation
University:Portland State UniversityCandidate:Rechtsteiner, AndreasFull Text:PDF
GTID:1453390008470045Subject:Biology
Abstract/Summary:
Increasing amounts of data obtained from high-throughput experiments in molecular biology require new analysis methods. One such high-throughput technology uses microarrays for the simultaneous measurement of expression levels of thousands of genes. Measuring the expression of many genes, even whole genomes, has proven useful for understanding the molecular basis of diseases and has allowed for a more systems level view of cellular processes. Besides the statistical analysis of such large amounts of data, another challenge is the biological interpretation of the analysis results. Biological experts trying to understand the biological meaning of the expression results are often overwhelmed by the amount of functional information in the literature and databases.; We developed algorithms that address both of these challenges. The Clustering in SVD Subspace (CSS) algorithm identifies similarly regulated genes in time series gene expression data by identifying gene clusters in two-dimensional Singular Value Decomposition (SVD) subspaces. The MeSH Functional Theme Finder (MFTF) algorithm was developed for the discovery of biomedical functional themes associated in the literature with groups of genes or proteins, e.g. as obtained from high-throughput experiments. The CSS algorithm has been applied to two expression data sets, one obtained during the yeast cell-cycle and the second after herpes cytomegalovirus infection of human fibroblast cells. The algorithm successfully identified clusters of genes whose expression is similarly regulated during the respective cellular processes. The MFTF algorithm was applied to the gene clusters identified via the CSS algorithm in the herpes data set. The MFTF algorithm identified, in an automated way, the same main functional themes that were identified by a biological expert. In addition, the algorithm identified new relevant functional themes associated with the gene clusters. Finally, a large-scale validation of the MFTF algorithm is presented. The vector space model underlying the MFTF algorithm was used to correctly classify a large number of proteins into families of functionally related proteins, proving that keywords from literature can be used to capture functional relationships of proteins or genes.
Keywords/Search Tags:Functional, Data, Gene, Expression, MFTF algorithm, Proteins
Related items