Font Size: a A A

Computationally intensive statistical methods for analysis of gene expression data

Posted on:2004-04-09Degree:Ph.DType:Thesis
University:University of California, BerkeleyCandidate:Pollard, Katherine SnowdenFull Text:PDF
GTID:2450390011453450Subject:Statistics
Abstract/Summary:
In recent years, gene expression experiments have become increasingly common in molecular biology and biomedical research. Utilizing high density array technologies, researchers are now able to simultaneously measure the expression of thousands of genes in one or more samples. Typically, array experiments are performed for a sample of subjects (e.g. patients, cells, mice) drawn from a population of interest. Because the number of genes studied far exceeds the number of samples, statistical rigor is particularly important in this setting, and new statistical methods are needed for appropriate and accurate data analysis.; Questions of interest include how to identify (i) statistically significant subsets of genes (e.g. genes differently expressed in two populations); (ii) groups of genes whose expression patterns across subjects are significantly correlated, since such genes might be part of the same causal mechanism or pathway; (iii) subpopulations of subjects whose gene expression profiles are significantly correlated; and (iv) groups of genes whose expression patterns have a significantly similar association with an outcome (e.g. survival, disease progression, phenotype). Each of these problems provides an opportunity for statistical inference, and methods are needed which adequately address the high dimension of the data.; We describe a general statistical framework for analysis of gene expression data, including bootstrap methods for assessing the reliability and repeatability of an experiment. We also propose specific new methods for multiple hypothesis testing (question (i)), clustering genes and subjects, possibly simultaneously (questions (ii) and (iii)), and supervised clustering (question (iv)). The asymptotic validity and finite sample performance of these computationally intensive statistical techniques are studied in simulations. Their power to answer biologically important questions is then demonstrated on a collection of experimental data sets, some of which are publicly available.
Keywords/Search Tags:Gene expression, Data, Statistical, Methods
Related items