Font Size: a A A

Exploration of gene region simulation, correction for multiple testing, and summary methods

Posted on:2013-07-25Degree:Ph.DType:Dissertation
University:Boston UniversityCandidate:Hendricks, Audrey EleanorFull Text:PDF
GTID:1454390008463292Subject:Biology
Abstract/Summary:
Genome Wide Association Studies produce a wealth of data. However, a substantial portion of the genetic heritability for complex diseases is not explained by the most highly associated markers. Researchers have recently demonstrated that they can explain a much larger proportion of the genetic variation by delving more deeply into the data. For instance, Yang et al. showed that approximately 50% of the heritability in human height is explained by about three hundred thousand markers. To extract this information, researchers are moving to more complex analyses that model the relationships between a trait and two or more genes. These complex analyses often use gene regions instead of markers as the unit of measure. We call this gene region analysis and determining how to represent each region is often an obstacle. Here, we lay the foundation for evaluating summary methods used in complex gene-based analyses by exploring three aspects of gene region analysis: (1) simulating a gene region, (2) adjusting for multiple testing, and (3) detecting association to a gene region using summary methods.;We first compare simulation methods and find that the software program, Hapgen, produces replicates that give adequate sampling variability while retaining the unique characteristics of the gene region used for simulation. We then evaluate methods to adjust for multiple comparisons within a gene region. We find that extreme tail theory performs well but is computationally expensive as compared to Li & Ji's effective number of independent SNP method, which does not always retain the appropriate type-I error rate, but is computationally efficient. Finally, we find that using the marker with the lowest p-value to summarize a gene region often has the highest power for regions with moderate to high correlation while using a summary method based off of BIC forward selection performs better in regions with low correlation. These findings will help researchers design simulation studies to explore the performance of gene region summary measures in complex analyses, to adjust for multiple comparisons when testing markers in a gene region, and to use gene region summary measures to detect association between a region and a trait.
Keywords/Search Tags:Gene, Summary, Multiple, Association, Simulation, Methods, Complex, Testing
Related items