Font Size: a A A

Expanding Genome Association Analysis Methods And Creating The Pragmatic Tools Of Analysis

Posted on:2019-04-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:J B WangFull Text:PDF
GTID:1360330545964081Subject:Animal breeding and genetics and breeding
Abstract/Summary:PDF Full Text Request
The association analysis between genotype from DNA sequencing and chipping of whole genome and phenotype,is the method which would detect the unknown gene of human disease and the agricultural traits or predict phenotype from reference genotype.Force on the arm of model,we can divide association analysis into two methods: Genome Widely Association Study(GWAS)and Genomic Prediction(GP).Following two approaches for changing the kinship of random effect in Mixed Linear Model,in this study we develop two Genomic Prediction methods: 1)CBLUP,it is used to cluster reference and inference population into groups,and switch the kinship between groups with the kinship between individuals.By optimum the compress level with model fit(likelyhood),it is used to predict with BLUP model;2)SBLUP,it is method with two steps.For the first step,we did GWAS analysis with SUPER method for the genotype and phenotype of reference population,giving the optimum bin size and the number of bins.For the second step,we built the kinship with them significant bins and then use this one to switch the all markers kinship into the BLUP model.By comparing true traits and simulation data between our methods and GBLUP,Bayesian LASSO,we find SBLUP is sensitive with sample trait(the gene control trait is few),in this situation SBLUP is the best method for predicting inference phenotype than the other three methods;And CBLUP is sensitive with that trait controlled by more genes and low heritability,in this situation CBLUP is the best method with accuracy between inference phenotype and prediction phenotype.In the true data,there are total 157 traits.With these two methods,there are only 21 traits no-superiority than Bayesian LASSO,the superiority rate is 86.6%.These two methods have expanded BLUP method dominance area,and explored how to optimize the random effect and the kinship in the Mixed Linear Model.For testing the computing efficiency,we used duplicated data and real big data to test.The result shows our methods are faster than Bayesian LASSO but slower than GBLUP,that suggest our method can retain the advantage of computing efficiency and also improve the accuracy of model prediction.In this research we recode GAPIT that is most popular Genome Association and Prediction Integrate Tools on the world,we named this new software GAPIT3.This software has been online and useful.There are some major new functions for this software: 1)Integrate General Linear Model(GLM),Mixed Linear Model(MLM),Compress Mixed Linear Model(CMLM),SUPER,Farm-CPU and Multi-Loci Mixed Linear Model(MLMM)into our new software,so that users can use any methods to compare with each other in one software;2)Recode the logical relationship Data Prepare(DP),Quality Control(QC),Intermediate Components(IC),Sufficient Statistics(SS),Interpretation and Diagnoses(ID).Following this logical relationship,it is convenience to adapt the calling and input of third software.This function is used for preparation of big data analysis online in future;3)Multi-output of genotype analysis and GWAS result.Under original function of GAPIT,we add NJ tree,3D PCA and association analysis of correlation between significant gene and the markers around it.It is plentiful for explaining the data structure and present the result of analysis.This software has been published online,we can login at www.zzlab.net/GAPIT.In this study we develop a new GWAS model forced on detecting interaction effect between genotype and environment,which is used to divide genetic effect into additive and interactive.This software is developed by C language with dynamic RAM management,2 bit genotype coding,parallel computing and other functions dealing big data.For 23 G original data,there would be 207 G data under three environments,GbyE software only need 4 hours to calculate completely.Meanwhile,for simulation data with genetic correlation,we create new simulation method to show how to simulate interaction effect between Genotype and environment,how much rate of genetic effect.As result show that,GbyE model can give better power in simulation data analysis in the situation there would be major genetic effect is interaction effect and same power in the situation there would be major genetic effect is additive effect.In real data,we test Ames and NAM maize population to detect additive and interactive gene from flower-time trait,and we use enrichment with previous study to validate our model.As result show that,in 500 K window-sizes we can get validation rate 30% is better than validation rate of Random select 20%,however,in 1 M window-sizes we can get validation rate 40% is also better than validation rate of Random select 30%.This software is online at www.zzlab.net/GbyE.
Keywords/Search Tags:Genome Widely Association Study, Genomic Prediction, BLUP, GAPIT, interaction between gene and environment
PDF Full Text Request
Related items