Font Size: a A A

Development Of An Iterative Usage Of Fixed Effect And Random Effect Models For Powerful And Efficient Genome-Wide Association Studies

Posted on:2017-02-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:X L LiuFull Text:PDF
GTID:1220330485478067Subject:Animal breeding and genetics and breeding
Abstract/Summary:PDF Full Text Request
With substantial success in finding genes underlying human diseases and agriculturally important traits, Genome-Wide Association Study(GWAS) has been suffering two controversial problems: overwhelmed false positives and painful false negatives.False positives are results of inflation of P values for testing genetic markers. The inflation is commonly caused by population structure and kinship among individuals. Incorporating population structure in a General Linear Model(GLM) or both population structure and kinship in a fixed effects and random effects Mixed Linear Model(MLM) controls the false positives well, but also weakens the signals of true positives, resulting false negatives. This is a tragic problem for dissecting a wide range of important traits as demonstrated in an example from Arabidopsis, Nature,(465) 627–631, 2010. For the flowering time trait that controls adaptation, the known genes could not be distinguished from the background, no matter using simple model without control population structure or complex model that includes kinship matrix as random effect.Here, we present a new method, Fixed and random model Circulating Probability Unification(FarmCPU). A Fixed Effect Model(FEM) and a Random Effect Model(REM) are used iteratively to solve the confounding problem. Pseudo QTNs(Quantitative Trait Nucleotides) are included as fixed effects in FEM to control false positives and estimated by REM. FarmCPU iteratively using FEM and REM until there is no new pseudo QTNs detected. Compared with MLM, FarmCPU significantly improved statistical power and speed. The results are shown below:(1) Results from 107 Arabidopsis real experiments show that FarmCPU can be generally used in multi species like human, pig, mice, and maize and detected some candidate genes lost by MLM.(2) Simulation results indicate that FarmCPU has a better power compared with MLM. For a complex trait that has 75% heritability and controlled by 500 QTNs, under the false discover rate of 10%, FarmCPU could detect 50 more QTNs than MLM.(3) Computing time of FarmCPU is linear to both number of individuals and number of genetic markers. Now, a dataset with half million individuals and half million markers can be analyzed in three days. Researchers will be able to analyze exponentially growing datasets, and also have greater success with less risk when mapping genes of interest.
Keywords/Search Tags:FarmCPU, GWAS, false positive, false negative, confounding, algorithm
PDF Full Text Request
Related items