Font Size: a A A

Hypothesis settings and methods for gene expression meta-analysi

Posted on:2013-01-14Degree:Ph.DType:Thesis
University:University of PittsburghCandidate:Song, ChiFull Text:PDF
GTID:2454390008476472Subject:Biostatistics
Abstract/Summary:
With the advent of high-throughput technologies, biomedical research has been dramatically reshaped in the past two decades. Technologies such as microarrays are broadly utilized to study the relationship between genomic alterations and disease outcomes. However, genomic analyses are criticized for their low reproducibility and generalizability. Large-scale meta-analysis of multiple studies is a timely and important issue with great public health significance, because robust biomarkers can be found for complex human diseases such as major depression disorder using meta-analysis techniques. Accurate marker detection will improve the disease diagnosis, treatment selection and prognosis prediction.;In this dissertation, I first illustrate different hypothesis settings for two different types of biomarkers: biomarkers that are differentially expressed (DE) "in all" studies and biomarkers that are DE "in any" studies. Then I propose a robust setting HSr to detect genes differentially expressed (DE) "in majority of" studies. For HS r, I propose an order statistic of p-values (rth order p-value, rOP) across combined studies as the test statistic. I also explore statistical properties such as power and asymptotic behavior of rOP. The method is applied to three examples to demonstrate its robustness and sensitivity. I develop two methods to guide the selection of r.;The non-complementary property of r causes anti-conservative inferences. To overcome this, I propose HS' r as a complementary form of HSr. For HS'r, the major obstacle comes from the mixture nature of the null distribution. From a Bayesian point of view, I propose a semiparametric mixture model for the observed p-values in combined studies. A Bayes factor is calculated based on the posterior distribution to substitute traditional hypothesis testing for HS0 r. I also develop an expectation-maximization (EM) algorithm to fit this model. Simulation results and real data analysis show improved specificity and sensitivity of this novel approach compared to traditional methods.;Beyond meta-analysis of single genes, I also propose a framework to integrate multiple biological networks. A conservative subnetwork in a subset of datasets can be identified using my approach.;In conclusion, I discuss various interesting questions in genomic meta-analysis in this dissertation. And I provide a series of statistical tools to address them.
Keywords/Search Tags:Hypothesis, Methods, Meta-analysis
Related items