Font Size: a A A

Novel Methodologies in Statistical Genetics for the Discovery of Causal Variants

Posted on:2012-01-16Degree:Ph.DType:Dissertation
University:Harvard UniversityCandidate:Lipman, Peter JFull Text:PDF
GTID:1454390008992317Subject:Biology
Abstract/Summary:
The excitement over findings from genome-wide association studies has been tempered by the difficulty in finding the true causal variants, rather than markers correlated with them. Recent studies have analyzed multiple, highly correlated phenotypes (with expression profiles and epigenetic data), making it difficult to understand which associations are causal and which are seemingly causal, induced by phenotypic correlations. We introduce a new statistical approach to detect causal genetic effects on survival data in the presence of genetic associations with secondary phenotypes that might also influence survival. The approach is then used to analyze survival after cardiac surgery, where genetic components of myocardial infarctions are determined to not influence post-surgery hospital duration except through the myocardial infarction-pathway. We implement the methodology for discrete, continuous, and survival primary outcomes in an R package, 'CGene'.;Even in large-scale genome-wide association studies, only a fraction of the true associations are detected at the genome-wide significance level. When few or no associations reach the significance threshold, one strategy is to perform follow-up study on the most promising candidates. We propose an overall test that analyzes the single nucleotide polymorphisms with the smallest p-values simultaneously, allowing for an early assessment of whether the follow-up study is likely promising. An application to a genome-wide association studies for chronic obstructive pulmonary disease suggests that there are true associations among the top single nucleotide polymorphisms; a follow-up study is recommended.;It has been difficult to implicate causal variants using the genome-wide association study approach, which is an indirect mapping technique that often detects markers rather than causal variants. For the identification of disease susceptibility loci, sequencing data that examines every genetic locus directly is necessary. We propose a novel method to detect disease susceptibility loci using sequencing data by analyzing patterns of linkage disequilibrium in a case-control setting. Simulations suggest that the method distinguishes the causal variant from other nearby variants and outperforms the standard tests. An application to a dataset for nonsyndromic cleft lip with or without cleft palate illustrates the practical relevance of the approach. The method implicates one variant that was not found by standard analyses.
Keywords/Search Tags:Causal, Genome-wide association studies, Method, Genetic, Approach
Related items