Clustering by genetic ancestry using genome-wide single nucleotide polymorphisms and incorporating genetic ancestry into genetic risk prediction models | Posted on:2012-04-11 | Degree:Ph.D | Type:Thesis | University:Boston University | Candidate:Solovieff, Nadia | Full Text:PDF | GTID:2464390011466866 | Subject:Biology | Abstract/Summary: | PDF Full Text Request | Genome-wide association studies (GWAS) have detected disease associated variants and increased the feasibility of building genetic risk prediction models. Population stratification (PS) causes spurious associations in GWAS and occurs when differences in allele frequencies of genetic markers are due to ancestral differences between cases and controls rather than the disease. Principal components analysis (PCA) is the established approach to detect PS and to adjust the genetic association for stratification by including the top principal components (PCs) in the analysis. An alternative solution to PS is genetic matching of cases and controls that requires, however, well defined population strata for appropriate selection of cases and controls. The strata defined for matching allow the investigator to examine cluster specific effects which can enhance our understanding of disease associated variants and improve the accuracy of risk prediction models.;In this thesis, we propose a new approach to test genetic associations and build genetic risk models in the presence of PS from GWAS. We first design a novel algorithm that uses the top PCs from a PCA to cluster individuals with similar ancestry into groups to match cases and controls. We demonstrate the effectiveness of our algorithm in real and simulated data, and show that matching cases and controls substantially reduces PS bias and can be more powerful than adjustment for PCs.;Next, we use the algorithm to examine the population substructure of African Americans with sickle cell disease and show that they are less genetically admixed than African Americans without the disease and have ancestry similar to populations from western Africa.;Finally, we propose an approach to build a genetic risk prediction model that incorporates ethnic specific effects. We extend the framework of a Bayesian naive classifier to include ancestry and show how a prediction can be made even when the ancestry for an individual is unknown. We compare the Bayesian classifiers to logistic regression models that include a genetic risk score. We show that incorporating ancestry improves the accuracy of prediction in both the Bayesian and logistic regression framework but that the accuracy is higher for the Bayesian classifier. | Keywords/Search Tags: | Prediction, Genetic, Models, Ancestry, GWAS, Disease, Cases and controls, Bayesian | PDF Full Text Request | Related items |
| |
|