Clustering by genetic ancestry using genome-wide single nucleotide polymorphisms and incorporating genetic ancestry into genetic risk prediction models

Posted on:2012-04-11

Degree:Ph.D

Type:Thesis

University:Boston University

Candidate:Solovieff, Nadia

Full Text:PDF

GTID:2464390011466866

Subject:Biology

Abstract/Summary:

PDF Full Text Request

Genome-wide association studies (GWAS) have detected disease associated variants and increased the feasibility of building genetic risk prediction models. Population stratification (PS) causes spurious associations in GWAS and occurs when differences in allele frequencies of genetic markers are due to ancestral differences between cases and controls rather than the disease. Principal components analysis (PCA) is the established approach to detect PS and to adjust the genetic association for stratification by including the top principal components (PCs) in the analysis. An alternative solution to PS is genetic matching of cases and controls that requires, however, well defined population strata for appropriate selection of cases and controls. The strata defined for matching allow the investigator to examine cluster specific effects which can enhance our understanding of disease associated variants and improve the accuracy of risk prediction models.;In this thesis, we propose a new approach to test genetic associations and build genetic risk models in the presence of PS from GWAS. We first design a novel algorithm that uses the top PCs from a PCA to cluster individuals with similar ancestry into groups to match cases and controls. We demonstrate the effectiveness of our algorithm in real and simulated data, and show that matching cases and controls substantially reduces PS bias and can be more powerful than adjustment for PCs.;Next, we use the algorithm to examine the population substructure of African Americans with sickle cell disease and show that they are less genetically admixed than African Americans without the disease and have ancestry similar to populations from western Africa.;Finally, we propose an approach to build a genetic risk prediction model that incorporates ethnic specific effects. We extend the framework of a Bayesian naive classifier to include ancestry and show how a prediction can be made even when the ancestry for an individual is unknown. We compare the Bayesian classifiers to logistic regression models that include a genetic risk score. We show that incorporating ancestry improves the accuracy of prediction in both the Bayesian and logistic regression framework but that the accuracy is higher for the Bayesian classifier.

Keywords/Search Tags:

Prediction, Genetic, Models, Ancestry, GWAS, Disease, Cases and controls, Bayesian

PDF Full Text Request

Related items

1	Building Risk Prediction Model for Complex Genetic Disease Using High Dimensional Genetic Data
2	Craniometric ancestry proportions among groups considered hispanic: genetic biological variation, sex-biased asymmetry, and forensic applications
3	Ancestry Estimation and Application to the Genetic of Complex Diseases in Human
4	The Efficiency Of 27-plex SNP Multiplex System For Unknown DNA Donor Ancestry Estimates Of Intercontinental Populations
5	Prediction Model Of Biochemical Recurrence Within 1 Year After Radical Prostatectomy Based On Bayesian Network
6	Developing A Novel Panel Of Ancestry Informative Markers For Unknown DNA Donor Ancestry Estimates Of Intercont Inental Populations
7	Construction And Validation Of A 74AIM-SNPs Multiplex System Based On Capillary Electrophoresis
8	Analysis Of IVIG Non-response Prediction Of Kawasaki Disease Based On Genetic And Laboratory Tests And Comparison Of Five Prediction Models
9	Research On LncRNA-disease Association Prediction Based On Bayesian Generative Adversarial Networks
10	The Selection Of Tibetan Ancestry Informative SNPs And Forensic DNA Geographical Ancestry Inference