Font Size: a A A

Genome-wide Association Analysis Based On Variable Selection Embedded Iterative Regression

Posted on:2024-03-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:G L ZhouFull Text:PDF
GTID:1523307160969439Subject:Animal breeding and genetics and breeding
Abstract/Summary:PDF Full Text Request
With the development of genotyping technology,GWAS has been widely used in the detection of genetic variation in human diseases,complex traits of animals and plants.However,GWAS has been plagued by false positives.False positives are results of inflation of P values of SNPs,which may cause by population structure and kinship among individuals.Mixed linear model(MLM)uses the population structure as a fixed effect and the kinship matrix as a random effect,and can effectively control false positives.However,for complex traits controlled by several large-effect loci,MLM approach may not be appropriate.MLM method do not facilitate good estimates of marker effects because the model is never correct if a trait is indeed controlled by multiple loci,which is the case for most complex traits.Various multi-locus GWAS methods have been adopted to address this issue,which have significantly improved the detection efficiency,but there are still some deficiencies.Aiming at the shortcomings of existing multi-locus GWAS methods,we propose a novel multi-locus GWAS method named Selector-Embedded Iterative Regression(SEIR)algorithm,which iteratively integrates single-marker scanning with an embedded selector.This study carried out systematic and in-depth research from the perspectives of the establishment of the new method framework,efficacy verification based on different scenarios,and analyses of real phenotypes derived from a wide range of species.The results are as follows:(1)First established the framework of the SEIR method.Combining the variable selection method with the fixed effect model in an iterative manner.In each iteration,the use of the fixed effect model results in a substantial reduction in the number of markers that can make selector quickly select pseudo QTNs with a relatively slight computational burden.In each iteration,the selector reselects pseudo QTNs as the covariates in the fixed effect model.The process runs iteratively until no new pseudo QTNs appear and convergence is reached.SEIR organically combines the advantages of fast detection of fixed linear models and efficient control of false positives by variable selection methods,and constructs an efficient method for multi-locus GWAS method.(2)The simulation results show that fixed effect model can filter out most noisy SNPs,and reduce the computational burden of variable selection methods;The embedded variable selection method can make the covariables in the fixed effect model have more QTNs and fewer spurious association SNPs,so as to better control false positives and improve statistical power.(3)Simulation results indicate that the statistical power of the SEIR method was not only higher than that of single-locus GWAS methods,but also higher than that of existing multi-locus GWAS methods;In addition,compared with other multi-locus methods,the strategy of selecting pseudo QTNs by SEIR method can increase the detection of true QTNs by about 5%-10%,and reduce the detection of spurious associated SNPs by about 0.6%-5%.(4)Results from actual traits in different species showed that SEIR method could not only detect more previously reported associated SNPs,but also detect new associated SNPs.For example,in the backfat thickness trait of pig,the reliability of the new associated SNPs detected by SEIR was verified by combining with multiple omics data,such as Hi-C,Chip-Seq,ATAC-seq and RNA-seq data.(5)Based on the simulation results,an E-GWAS strategy integrating different GWAS methods was further proposed.Simulation results show that compared with other GWAS methods,the E-GWAS strategy can further reduce the false positive rate by about 10%-15%.In summary,we proposed a multi-locus GWAS method SEIR.The results show that SEIR has the advantages of fast computation speed,high statistical power and wide application range.The achievements of this research have contributed to the effective development,complement and enrich the contents of GWAS methods.
Keywords/Search Tags:GWAS, multi-locus method, false positives, multi-locus model, SEIR
PDF Full Text Request
Related items