Font Size: a A A

Genome-wide association studies in statistical genetics

Posted on:2009-07-07Degree:Ph.DType:Dissertation
University:Michigan Technological UniversityCandidate:Tang, RuiFull Text:PDF
GTID:1444390002495909Subject:Biology
Abstract/Summary:PDF Full Text Request
This dissertation is composed of three separate parts: the first part is proposing a new approach for genetic association analysis which is based on a variable-sized sliding-window framework; the second part is proposing a method of computing p-values adjusted for correlated tests that attains the accuracy of permutation or simulation-based tests in much less computation time; the third part is dealing with the genome-wide association studies based on the real rheumatoid arthritis (RA) disease data sets from Genetic Analysis Workshop 16 (GAW16) problem 1.;Recently with the rapid improvements in high-throughout genotyping techniques, researchers are facing a very challenging task of large-scale genetic association analysis, especially at the whole-genome level, without an optimal solution. In part I of this dissertation, we propose a new approach for genetic association analysis which is based on a variable-sized sliding-window framework and employs Principal Component Analysis to find the optimum window size. With the help of bisection algorithm in window size searching, our method tackles the exhaustive computational problem and is more efficient and effective than currently available approaches. We evaluate the performance of the proposed method by comparing it with two other methods---tests based on a single-nucleotide polymorphism and variable-length Markov chains method. We demonstrate that the proposed method consistently outperforms the other two methods, with use of data sets simulated under different disease models, especially in multi-locus disease models. Furthermore, since the proposed method is based on genotype data, it does not require any computationally intensive phasing program to account for uncertain haplotype phase. In the real data analysis, we conduct the genome-wide association study in Genetic Analysis Workshop 16 (GAW16) problem 1 data using the proposed method. By our method we successfully identified several susceptibility genes that have been reported by other researchers and more disease causing genes for fellow-up.;In the second part, we deal with p-value correction for the multiple testing, especially when the tests are correlated with each other. With genome-wide association (GWA) studies becoming a priority, large scale genetic association studies can test hundreds of thousands of genetic markers for association with a trait. Many of the association tests may be correlated because of the linkage disequilibrium between the nearby markers. Permutation procedure is a standard statistical technique for determining statistical significance when performing multiple correlated tests for genetic association since conventional correction such as the Bonferroni (or Sidak) procedure is typically too stringent. However, permutation procedure for large scale genetic association studies is computationally demanding. In this dissertation, we propose a method of computing p-values adjusted for correlated tests that attains the accuracy of permutation or simulation-based tests in much less computation time, and we demonstrate through simulation that this method provides a valid adjustment for a large scale of correlated association tests and is more powerful than Sidak procedure and the method proposed by Karen and Michael (2007). The method presented here breaks down the large analysis into blocks within which the SNPs are highly correlated with each other. We use Markov Model to take into consideration of the relationship between neighboring blocks and compare the observed test statistics for each block directly to their asymptotic distribution through numerical integration.;In the third part, I discussed two applications of statistical methods for genome-wide association study. Random forests (RFs) have been proposed as an alternative strategy for the analysis of genetic data. I introduce novel uses of the random forest approach for the assessment of gene and haplotype importance, and apply the proposed approaches to the detection of genes containing variations that predict rheumatoid arthritis (RA). Also, indirect association as a result of linkage disequilibrium (LD) is a key factor in the success of genetic association studies. The new imputation methods are therefore an important addition to genetic epidemiologic methods. In this dissertation, I present the results of compare the performance of several imputation methods in the context of combining two datasets that have been genotyped at different sets of markers or imputation of completely missing (i.e. "untyped") markers. Methods were compared in terms of imputation error rates and performance of association tests that use the imputed data. The GAW16 Problem 1 dataset, provided by the North American Rheumatoid Arthritis Consortium (NARAC), was used.
Keywords/Search Tags:Association, Genetic, Data, Rheumatoid arthritis, Part, Method, GAW16, Statistical
PDF Full Text Request
Related items