Font Size: a A A

Multiple testing using the posterior probability of half-space: Application to gene expression data

Posted on:2006-02-23Degree:Ph.DType:Thesis
University:University of Waterloo (Canada)Candidate:Labbe, AurelieFull Text:PDF
GTID:2458390008465143Subject:Statistics
Abstract/Summary:
We consider the problem of testing the equality of two sample means, when the number of tests performed is large. Applying this problem to the context of gene expression data, our goal is to detect a set of genes differentially expressed under two treatments or two biological conditions. A null hypothesis of no difference in the gene expression under the two conditions is constructed. Since such a hypothesis is tested for each gene, it follows that thousands of tests are performed simultaneously, and multiple testing issues then arise. The aim of our research is to make a connection between Bayesian analysis and frequentist theory in the context of multiple comparisons by deriving some properties shared by both p-values and posterior probabilities. The ultimate goal of this work is to use the posterior probability of the one-sided alternative hypothesis (or equivalently, posterior probability of the half-space) in the same spirit as a p-value. We show for instance that such a Bayesian probability can be used as an input in some standard multiple testing procedures controlling for the False Discovery rate.; The first chapter of this thesis presents an introduction to the problem of cDNA microarray data. The underlying biological principles of this type of data, as well as the associated statistical issues are discussed. In the second chapter, we follow the work of Dudley & Haughton (2002) regarding the asymptotic normality of posterior probabilities of half-spaces. We show that such a probability shares with the frequentist p-value the property of uniformity under the null hypothesis. This result holds asymptotically, when the number of observations available for each test is large enough. Our approach is based on the observation that uniformity under the null hypothesis (as p-values are assumed to be) is the main property used in the multiple testing procedure developed by Benjamini and Hochberg (1995). We are then able to use the posterior probability, defined as an input to this procedure, in the same spirit as a p-value. We note that such a probability can also be seen as a test statistic from which the distribution under the null hypothesis is known. As a result, it can also be used in any extension of the Benjamini-Hochberg procedure, providing a control of the False Discovery or False Negative Rate.; Motivated by the case of microarray data, where the number of observations per gene is small, we show in the third chapter that the uniform property holds in a non-asymptotic manner, under a non-informative or a conjugate gamma model. A goodness of fit study on several microarray datasets is performed as well as an extended simulation study. This gamma model is extended in the fourth chapter to a multiplicative random effect ANOVA model, taking the arrays and dyes effects into consideration. Other models, such as inverse Gaussian models are also considered in the fifth chapter. In such cases, the uniform property of the posterior probability considered can be observed empirically when the sample size is small. Results using these models are very encouraging. A case study is presented in Chapter 6 using three microarray datasets resulting from a collaborative study between the Universities of McMaster and Waterloo. The methods developed in this thesis are then applied and results are compared. Our future work is described in the last chapter, and a brief discussion of the work proposed in this thesis is finally included.
Keywords/Search Tags:Posterior probability, Testing, Gene expression, Chapter, Data, Null hypothesis, Using, Work
Related items