Font Size: a A A

Effect Of Various Marker Genotypic Coding Values On Genome-Wide Association Studies

Posted on:2021-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:Z C ZhangFull Text:PDF
GTID:2480306464461954Subject:Master of Agriculture
Abstract/Summary:PDF Full Text Request
Genome-wide association study(GWAS)and linkage analyses are widely used in genetic analysis and gene mining of quantitative traits in animals,plants,and humans.In theory and application,it is very important to investigate scientific question in GWAS methodology.Although there are many GWAS approaches available,the coding values for genotypes aa,Aa and AA in these methods and their packages are different,including(-1,0,1),(0,0.5,1),(0,1,2),(-2p,1-2p,2-2p),(0,1,0)and(-1,1,-1).Up to now there have no knowledge on the differences of GWAS results from various genotypic coding values.To investigate the effect of various genotypic coding values on the results in natural populations with heterogeneous genotypes,first,we simulated all the phenotypic and genotypic datasets in natural populations,and transformed the marker genotype datasets into the various genotypic coding values datasets.All the datasets were used to detect the marker-trait association using the mrMLM method,and its purpose was to investigate the effect of the coding values of various genotypes on GWAS.Then,natural population was changed as F2,and the mrMLM method was used to investigate the effect of the coding values of marker genotypes on mapping quantitative trait locus(QTL),FASTmrMLM,FASTmr EMMA,p LARm EB,p KWm EB and ISIS EM-BLASSO methods.Finally,two real datasets in rice natural population and in F2 were analyzed to confirm the conclusion from Monte Carlo simulation studies.The main results are as follows.1.In the simulated natural population,the 1st to 3rd additive QTNs,the 4th to 6th dominant QTNs and the 7th to 10th additive-dominant QTNs were simulated.Under the situations of the above-mentioned first four genotypic coding values,the mrMLM method was used to detect the marker-trait associations.As a result,the powers in the detection of the ten simulated QTNs were 39.8?41.5,98.9?99.3,99.2?99.4,0.0,0.0,0.0,98.8?99.3,95.6?96.3,27.5?29.7 and 3.6?4.0(%),respectively.This indicates the inability to detect the 4th to 6th dominant QTNs and the increase power of additive-dominant and additive QTNs with the increase of additive-effect contribution.If the datasets of marker genotypes were transferred into the last two types of genotypic coding values,the mrMLM method was also used to detect the marker-trait associations.As a result,the powers in the QTN detection were 0.0,0.0,0.0,33.2?34.3,94.3?94.6,96.2,2.0?2.4,1.7?1.9,72.2?73.9 and 100.0(%),respectively.This indicates the inability to detect the1th to 3th additive QTNs and the increase power of dominant and additive-dominant QTNs with the increase of dominant-effect contribution.The coefficients of variation for the estimates of QTN parameters decreased with the increase of QTN sizes.Although the relationship of the estimates for QTN effects across various genotypic coding values was consistent with the theories in quantitative genetics,there were the deviations from true values.The false positive and negative rates for the six types of genotypic coding values were 6.029?6.950(?)and 53.05?59.87(%),respectively,indicating good controls in false positive and negative rates.2.To overcome the effect of population structure in GWAS,nature population was changed into F2 population and the others are the same.As a result,the powers in the QTL detection were 19.0,57.5?58.0,83.0?83.5,0.0,0.0,0.0,85.0?85.5,83.5?84.5,38.0?38.5 and 5.0(%),respectively,under the first four types of genotypic coding values.If the datasets of marker genotypes were transferred into the last two types of genotypic coding values,the powers in the QTL detection were 0.0,0.0,0.0,17.5,66.5?67.0,88.0?89.0,1.5,5.5,76.5 and 84.5?85.0(%),respectively.If the FASTmrMLM,FASTmr EMMA,p LARm EB,p KWm EB,and ISIS EM-BLASSO methods were adopted,the same trend was observed.The mean square error for the estimates of QTL parameters decreased with the increase of QTL sizes.The unbiasedness for QTL effects was better than that in natural population.The false positive rate and false negative rate for the six types of genotypic coding values were 3.198?4.745(?)and 62.70?66.00(%),respectively,indicating good controls in false positive and negative rates.3.The mrMLM method was also used to identify the associations of thousand-grain weight with 1619 bin markers in 278 IMF2 individuals in Zhou et al(2012)and flowering time with 36901 SNP markers in 374 Asian rice accessions in Zhao et al(2011).As a result,additive gene GS3 and GW5/qsw5 for thousand-grain weight,and the additive locus id7004091 for flowering time on the 23.2 to 23.3 Mb region of chromosome 7 were detected under the first four types of genotypic coding values,but dominant locus wd8004070 for flowering time on the 24.1 to 24.2 Mb region of chromosome 8 wasn't detected.Under the last two types of genotypic coding values,on the other hand,the gene GS3 and GW5/qsw5 for thousand-grain weight and the locus id7004091 for flowering time weren't found,but the locus wd8004070 for flowering time was identified.The results are consistent with those in Monte Carlo simulation,indicating the effect of coding values of marker genotypes on GWAS.In summary,the loci with only dominant effects cannot be detected in the datasets with the first four types of marker genotype coding values,while the loci with only additive effects cannot be detected in the datasets with the last two types of marker genotype coding values.
Keywords/Search Tags:Genome-wide association studies, genotype coding value, additive effect, dominant effect, mrMLM
PDF Full Text Request
Related items