Font Size: a A A

Genetic matching by ancestry in genome-wide association studies

Posted on:2009-12-09Degree:Ph.DType:Thesis
University:Carnegie Mellon UniversityCandidate:Luca, DianaFull Text:PDF
GTID:2443390002499394Subject:Statistics
Abstract/Summary:
As part of the quest to understand the genetic underpinnings of complex disease, individuals are measured at a large number of genetic variants across the genome. The objective is to discover variants associated with increased liability to a phenotype of interest. In this thesis we propose a new statistical method called GEnetic Matching by Ancestry (GEM) for the analysis of Genome-Wide Association studies in the presence of population structure. Ignoring structure due to differential ancestry can lead to an excess of spurious findings and reduce power. Ancestry is estimated using the eigenvectors obtained from the singular value decomposition of the kernel matrix obtained from the genetic data. The distance between individuals is calculated as the Euclidean distance defined by the leading eigenvectors. The effects of ancestry are removed by matching cases and controls. If the controls are chosen by convenience, it is likely that some cases cannot be successfully matched to controls and vice versa. Typically samples are drawn from populations consisting of a number of relatively homogeneous subpopulations. Using statistical algorithms we identify these subpopulations and rescale them so that the distances between matched subjects are comparable to distances observed in a homogeneous sample. Unmatchable observations are outliers in this metric. To improve GEM we draw connections between singular value decomposition, principal component analysis, multidimensional scaling and spectral graph theory. We identify the effects of using various kernel matrices and spectral embedding algorithms. We investigate two statistical procedures for selecting the number of significant eigenvectors to be used for data embedding: a significance test for population structure based on the distribution of the largest eigenvalue of the covariance matrix of allele counts (Tracy-Widom theory), and an eigengap heuristic, which uses the difference between adjacent eigenvalues as a measure of population homogeneity.
Keywords/Search Tags:Genetic, Ancestry, Matching
Related items