Statistical methods for gene expression analysis fromcDNA microarrays

Posted on:2002-07-06

Degree:Ph.D

Type:Dissertation

University:University of California, Berkeley

Candidate:Bryan, Jennifer Frazier

Full Text:PDF

GTID:1460390011498138

Subject:Biology

Abstract/Summary:

PDF Full Text Request

Recent developments in microarray technology make it possible to capture the gene expression profiles for thousands of genes at once. With this data researchers are tackling problems ranging from the identification of “cancer genes” to the formidable task of adding functional annotations to our rapidly-growing gene databases. Specific research questions suggest patterns of gene expression that are interesting and informative, for instance, genes with large variance or groups of genes that are highly correlated. Cluster analysis and related techniques are proving to be very useful. However, such exploratory methods alone do not provide the opportunity to engage in statistical inference. Given the high-dimensionality (thousands of genes) and small sample sizes (often <30) encountered in these datasets, an honest assessment of sampling variability is crucial and can prevent the over-interpretation of spurious results. We describe a statistical framework that encompasses many of the analytical goals in gene expression analysis; our framework is completely compatible with many of the current approaches and, in fact, can increase their utility. We propose the use of a deterministic rule, applied to the parameters of the gene expression distribution, to select a target subset of genes that are of biological interest. In addition to subset membership, the target subset can include information about relationships between genes, such as clustering. This target subset presents an interesting parameter that we can estimate by applying the rule to the sample statistics of microarray data. The parametric bootstrap, based on a multivariate normal model, is used to estimate the distribution of these estimated subsets and relevant summary measures of this sampling distribution are proposed. We focus on rules that operate on the mean and covariance. Using Bernstein's Inequality, we obtain consistency of the subset estimates, under the assumption that the sample size converges faster to infinity than the logarithm of the number of genes. We also provide a conservative sample size formula guaranteeing that the sample mean and sample covariance matrix are uniformly within a distance ε > 0 of the population mean and covariance. The practical performance of the method using a cluster-based subset rule is illustrated with simulation studies and with an analysis of a publicly available leukemia data set. We describe extensions of the method to settings in which multiple populations are compared or gene expression is measured over time or at different values of a covariate.

Keywords/Search Tags:

Gene expression, Statistical

PDF Full Text Request

Related items

1	Statistical methods for gene expression analysis fromcDNA microarrays
2	Statistical methods for the analysis of expression quantitative traits
3	A Study On Eukaryotic Gene Expression Regulatory Systems Based On Statistical Modeling Methods
4	Computationally intensive statistical methods for analysis of gene expression data
5	Statistical and computational methods for molecular signature analysis with applications
6	DNA microarray analysis and statistical validation for expression profiling in Escherichia coli
7	Research On Statistical Analysis Methods Of Gene Expression Data Based On Metric Learning
8	A Self-Feedback GEP Algorithm And Its Applications On Statistical Modeling
9	Study On Statistical Methods For Analyzing Gene Expression Microarray Data Under Mixed Linear Model Framework
10	Statistical methods for gene set co-expression analysis