Improving statistical inference for gene expression profiling data by borrowing information

Posted on:2011-08-15

Degree:Ph.D

Type:Thesis

University:Iowa State University

Candidate:Qu, Long

Full Text:PDF

GTID:2444390002460058

Subject:Biology

Abstract/Summary:

Gene expression profiling experiments, in particular, microarray experiments, are popular in genomics research. However, in addition to the great opportunities provided by such experiments, statistical challenges also arise in the analysis of expression profiling data. The current thesis discusses statistical issues associated with gene expression profiling experiments and develops new statistical methods to tackle some of these problems.;In Chapter 2, we consider the insufficient sample size problem in detecting differential gene expression. We address the problem by developing and evaluating methods for variance model selection. The idea is that information about error variances might be learned from related datasets to improve the estimation of error variances. We develop a modified multiresponse permutation procedure (MRPP), modified cross-validation procedures, and the right AICc (corrected Akaike's information criterion) for choosing a variance model. Through realistic simulations based on three real microarray studies, we evaluate the proposed methods and suggest practical recommendations for data analysis.;In Chapter 3, we address the multiple testing problem by improving the estimation of the distribution of noncentrality parameters given a large number of two-sample t-tests. We provide parametric, nonparametric and semiparametric estimators for the distribution of noncentrality parameters, as well as false discovery rates (FDR) and local FDR. Simulations show that our density estimates are closer to the underlying truth and that our estimates of FDR are also improved relative to competing methods under a variety of situations.;In Chapter 4, we develop a novel combination of two statistical techniques with the aim to by-pass the curse of dimensionality problem in detecting differential expression of genes. We accept the fact that, in "small N, large p" situations, the data are not sufficient to provide enough information about dependency across genes. Hence, we suggest using a priori biological knowledge to assist statistical inference. We first use multidimensional scaling (MDS) methods to summarize prior knowledge about inter-gene relationships into a set of pseudo-covariates. Then, we develop a hierarchical additive logistic regression model conditional upon the generated pseudo-covariates. Simulations and analysis of real microarray data suggest that our strategy is more powerful than methods that do not use a priori information.;Future research directions are discussed at the end of the thesis.

Keywords/Search Tags:

Expression profiling, Gene expression, Information, Statistical, Data, Methods, Experiments

Related items

1	Statistical analyses of gene expression data derived fromcDNA microarray experiments of bone regeneration
2	Statistical methods in the design and analysis of gene expression data from cDNA microarray experiments
3	Study On Statistical Methods For Differential Gene Expression Detection Based On Sample Subsets
4	Statistical analysis of gene expression data in cDNA microarray experiments
5	The Study Of Gene Set Analysis Methods On Gene Expression Profiles And Its Applications In Medicine
6	Decision Forests-Based Study Of Tumorgene Expression Profiling Data Analysis
7	Rank -based methods for statistical analysis of gene expression microarray data
8	Study On The Statistical Methods In Classifying Samples By Gene Expression Profile
9	Statistical methods and software for high-throughput gene expression experiments
10	Statistical pattern recognition methods for diagnosis of cancer using gene expression data