Font Size: a A A

Extreme Value Analysis of The Likelihood Ratio Statistic in Linear Mixed Models

Posted on:2017-11-14Degree:Ph.DType:Thesis
University:University of California, DavisCandidate:Ding, KaiFull Text:PDF
GTID:2460390014952018Subject:Biostatistics
Abstract/Summary:
In current genomic studies, technologies such as microarrays usually result in large-scale data sets with a limited sample size. For example, in a DNA microarray experiment, expression levels of tens of thousands of genes are measured for a small number of experimental subjects. One major statistical concern about analyzing such large data sets is the multiple testing problem (multiple comparisons, or multiplicity) when doing simultaneous inferences. Corrections to fully account for consequences of multiple testings require an accurate knowledge of the distribution of the test statistics under the null hypothesis.;One common method to analyze the gene expression data is by the so-called linear mixed model, which contains both fixed and random effects. Random effects, also referred to variance components, are usually used to model individual-level variations which cannot be described by fixed effects. In a controlled experiment using DNA microarrays, for each gene measured, there is interest in testing if an individual variation exists in its expression level across different experimental subjects. Moreover, if such a variation exists, it might interact with some possible fixed effects to influence the expression level measured. Both the subject-level variation and its interaction with fixed effects are modeled as random effects in a linear mixed model. For testing the existence of such random effects, the most widely used statistical test is the likelihood ratio test. However, when testing the existence of one or more variance components (random effects), utilizing the null distribution of the likelihood ratio statistic provided by Wilks usually yields highly conservative results due to nonstandard conditions. Though various methods have been provided to approximate the null distribution of the likelihood ratio statistic under nonstandard conditions, each of them has its own flaws or limitations. Such a problem will be exacerbated in large-scale data sets, where multiple testing corrections require high accuracy of the upper tail of the null distribution of the likelihood ratio statistic. To overcome the problem, this work will provide methods based on extreme value distribution theory to study the finite sample null likelihood ratio statistic in linear mixed models for large-scale data sets.;In chapter 1, the linear mixed model and likelihood ratio test as well as existing methods to approximate the null distribution of the likelihood ratio statistic under nonstandard conditions are provided and discussed. An introduction to extreme value distribution theory is given as the basis of its application in this work.;Chapter 2 describes the relationship between the upper tail of an unknown distribution with its corresponding generalized extreme value distribution. Methods based on the block-maxima model are provided to estimate extreme upper quantiles and calculate p-values according to extreme values. A framework is built to study the finite sample null distribution of the likelihood ratio statistic for any experimental design.;In chapter 3, an alternative method by utilization of a threshold model, called the generalized Pareto distribution, is proposed. A numerical procedure to determine the threshold for fitting a generalized Pareto distribution is provided.;Large numbers of simulations are carried out in chapter 4, to analyze the performance of existing methods and the extreme value distribution based methods proposed by this work. Estimations of extreme upper quantiles derived by each method are compared to study accuracy of these approximations. Simulated experiment data sets with various settings and different testing problems are used to evaluate the provided methods.;Lastly, a discussion of the framework presented in this work is given in Chapter 5, as well as a recommendation of approximation methods for different cases.
Keywords/Search Tags:Likelihood ratio statistic, Linear mixed, Extreme value, Data sets, Methods, Random effects, Chapter, Work
Related items