Font Size: a A A

Genetic association studies using complex survey data

Posted on:2011-07-28Degree:Ph.DType:Dissertation
University:The George Washington UniversityCandidate:She, DeweiFull Text:PDF
GTID:1448390002453332Subject:Biology
Abstract/Summary:
The National Health and Nutrition Examination Survey (NHANES) has been conducted periodically since 1960 by the National Center for Health Statistics (NCHS) of the Centers for Disease Control and Prevention (CDC). NHANES has provided national estimates of the health and nutritional status of the U.S. civilian, noninstitutionalized population. The Third National Health and Nutrition Examination Survey (NHANES III) began in the fall of 1988 and ended in the fall of 1994. During the second phase of NHANES III (from October 1991 to October 1994), blood lymphocytes were collected from 7,159 participants aged 12 years and older in anticipation of advances in genetic research. Linking the NHANES III phenotype data with this genetic information provides an opportunity to investigate the association of a wide variety of health factors with regard to genetic variations [NCHS, 2008a].;National surveys such as NHANES III employ a complex, multistage, probability sampling design to select participants representative of the civilian, non-institutionalized U.S. population. If observations within sampled clusters are correlated and the correlation is ignored, the standard errors can be underestimated. For each sampled individual, the inverse of the product of the selection probabilities across all of the stages of sampling is their sample weights. If these sample weights are correlated with the characteristics of research interest for the observations then an analysis that does not take this into account can be biased [Korn and Graubard, 1999]. Test procedures developed for simple random samples are generally unsuitable for the analysis of data from these complex sample designs.;The classic question when looking into the genetic variations in a population is whether the population is in the state of Hardy-Weinberg Equilibrium (HWE). Li and Graubard proposed using Wald-like tests for HWE in a complex survey setting assuming that all individuals were independent [Li and Graubard, 2009]. This dissertation develops procedures for testing departures from HWE using family data in a complex survey setting. Two kinds of study designs, i.e., population-based design and family-based design, have been used for genetic association studies for simple random samples. This dissertation develops test procedures for identifying an association between a candidate gene and a disease using population-based complex survey data with and without the use of family structure information. Test procedures developed in the dissertation are applied to NHANES III data to test HWE for three loci, i.e., ADRB2 (rs1042713), VDR (rs2239185), and TGFB1 (rs1982073), and to test for associations between the locus ADRB2 (rs1042713) and obesity, between VDR (rs2239185) and high blood lead level, and between TGFB1 (rs1982073) and asthma.;Chapter 1 provides an introduction to study topics and a summary of the results from the dissertation.;Chapter 2 provides a detailed literature review with respect to tests for HWE and for association between a candidate gene and disease in both simple random sampling setting and complex sampling setting.;Chapter 3 presents six Pearson chi2 based tests for a diallelic locus of autosomal genes for testing departures from HWE using family data in a complex survey setting. The finite sample properties of the proposed test procedures are evaluated via Monte Carlo simulation studies. Test procedures are applied to three loci from NHANES III genetic databases, i.e., ADRB2 (rs1042713), VDR (rs2239185), and TGFB1 (rs1982073).;Chapter 4 focuses on trend tests for genetic association using population-based cross-sectional complex survey data. Tests for trend in disease with increasing number of alleles have been developed for simple random samples. However, surveys such as the NHANES III have complex sample designs involving multistage cluster sampling and sample weighting. These types of sample designs can affect Type I error and power properties of statistical tests based on simple random samples. In this chapter, we have derived tests of trend based on Wald and quasi-score statistics, with and without assuming a genetic model, that account for the complex sampling design. Both type I errors and powers are examined and compared among different test statistics in this study setting via Monte Carlo simulation studies.;Chapter 5 presents conditional likelihood score tests and trend tests using data from nuclear families in complex sample setting such as NHANES III. Simulation studies for different settings are conducted to evalate and compare the type I error rates and power among different test statistics. The F-version of trend test is recommended for genetic association studies for complex sampling with family data.;Finally, Chapter 6 discusses the strengths and limitations of proposed tests. Future research needs are also presented.
Keywords/Search Tags:Data, Survey, NHANES, Complex, Genetic association studies, Using, Tests, Chapter
Related items