Font Size: a A A

Construction Of Predictive Genetic Testing Based On Complex/Quantitative Traits

Posted on:2012-01-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:C Y YeFull Text:PDF
GTID:1114330371469158Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
This dissertation targets on developing nonparametric methods to build predictive genetic tests underlying complex traits and diseases. Predictive tests that capitalize on emerging genetic findings hold great promise for enhanced personalized healthcare. However, using existing risk factors and current statistical methods, predictive tests built on various diseases still attained limited discriminative accuracy. Furthermore, with the emergence of a large amount of data from genome-wide association studies (GWAS), interest has shifted towards high-dimensional risk prediction. Here, statistical methods were developed, that can build prediction models based on existing risk factors, as well as whole genome-wide datasets. Moreover, the new methods are capable of capturing possible interactions between genetic variants and environment risk factors, further improving the accuracy of prediction models. The forward ROC method is proposed for case-control datasets, while the CORC method is constructed to analyze family-based datasets.This dissertation consists of four chapters.Chapter1is an overall background introduction of recent progress in complex diseases studies, especially the advances in genome-wide association studies for detecting potentially impactful genetic variants involved in disease etiology. The developments of disease prediction using predictive genetic tests were introduced, generally from Mendelian traits to complex traits. A brief introduction of the ROC curve theory and its applications were also presented in diagnostic tests for medical cares, which was also a main theory adopted in the statistical methods for constructing the prediction models in this study.Chapter2A novel non-parametric method was introduced, called the forward ROC method, to form predictive genetic tests on high-dimensional case-control data, as well as on existing genetic and environmental risk factors. As compared to the existing methods, the new method is capable of adopting a computationally efficient algorithm to search for environment risk factors, genetic predictors on the entire genome, and their possible interactions for an optimal risk prediction model, without relying on prior knowledge of known risk factors. An efficient yet powerful procedure is also incorporated into the method to handle missing data. Through simulations and real data applications, our proposed method was found outperformed the existing approaches. The new method was applied to the Wellcome Trust rheumatoid arthritis GWAS dataset with a total of460,547markers. The results from the risk prediction analysis suggested important roles of HLA-DRB1and PTPN22in predicting RA.Chapter3Statistical methods for genetic risk prediction research, and particularly for correlated data, are however still lacking. To address this, a clustered optimal ROC curve (CORC) method was introduced, in order to build predictive genetic tests using data from family-based genetic research. For the proposed method, the proposed the forward ROC method has been extended, taking sample correlation into consideration, and implemented a forward selection algorithm to allow for high-dimensional data and the capture of possible epistasis. The CORC method was evaluated using both simulations and a real-data application, showing that the method performed better than other existing methods under various pedigree structures and underlying disease models. In the real-data application, the method was applied to the large scale International Multi-Center ADHD Genetics Project dataset and formed a predictive genetic test for conduct disorder. The test reached a low to medium classification accuracy, with an AUC value of0.6908.Chapter4This chapter summarized the properties of these two proposed prediction methods for complex diseases and discussed their future applications, as well as potentials that could be further improved in upcoming studies, especially for those less studied sub-phenotypes and populations. Overall speaking, the two approaches are powerful and robust for high-dimensional risk prediction, and can be applied to case-control datasets and family-based datasets, respectively. It is believed that it will facilitate future risk prediction by considering a large number of predictors and their possible interactions for improved performance.
Keywords/Search Tags:Complex traits, disease prediction, high-dimensional datasets, interaction, ROCcurve
PDF Full Text Request
Related items