Mining genes for complex traits is fundamental to molecular biology research,gene editing,and crop breeding.Over the past two decades,genome-wide association studies(GWAS)have become one main approach to dissect the genetic basis of complex traits.Therefore,it is of great significance to study GWAS methodology and develop its software package.Since the establishment of the GWAS mixed model method,a series of methods have been proposed and corresponding software packages have been developed to mine quantitative trait genes.Although these methods,along with their fast algorithms,can detect quantitative trait locus(QTN)and estimate their effects,almost all methods estimate only allelic substitution effect of QTN,and additive and dominant effects of QTNs aren’t estimated,making estimated effects and their polygenic background controls incomplete.More importantly,it is difficult for current GWAS mixed model methods to directly detect QTN×environment interaction(QEI)for quantitative trait in natural populations.To solve these problems,a new 3Vmr MLM method was established in Dr Zhang’s laboratory to detect QTNs,QEIs,and QTN×QTN interaction(QQI)for quantitative traits in associated mapping populations.To popularize and apply this method,QTN and QEI modules of C++software IIIVmr MLM would be developed in this study to dissection the genetic basis of quantitative traits.First,based on the 3Vmr MLM method established by our laboratory,QTN and QEI modules of R software IIIVmr MLM were rewritten into QTN and QEI modules of C++software IIIVmr MLM.Then,matrix operation library,parallel processing,other related computer science,and numerical computing technologies would be used to save running time.Finally,real rice and simulated datasets were re-analyzed by my modules to validate QTN and QEI modules of C++software IIIVmr MLM.The main results are as follows.1.Based on high-performance C++library BOOST and Eigen,Open MP was used to construct a shared memory parallel system in multi-thread programming design,and QTN and QEI modules of C++software IIIVmr MLM were developed.Each module includes two classes,being data preprocessing and algorithm.In data preprocessing class,data structure dynamic_bitset in Boost library is used to convert string data of marker genotypes into binary data,which theoretically reduces memory usage to 1/16;all the individuals are matched across marker genotypes,trait phenotypes,population structures,kinship matrix,and covariates.In algorithm class,all the matched individuals in data preprocessing module are transmitted to the method module that users selected.In the method module,parallelized design is used;bit operation is used to improve calculation efficiency in the calculation of kinship matrix;Eigen’s vector and matrix data structures are used to conduct numerical operations;all the results are saved to the path that users selected.2.Development of GUI software through C++class library Qt framework with a series of highly intuitive and modular performances.First,modular codes are connected with designed UI through Qt’s signal-slot asynchronous mechanism to form a user graphical operation interface,which separates source codes from user operation.Second,user can use the software without operating codes.This either reduces the threshold for software use or protects security of codes,preventing users from accidentally modifying codes.3.Conduct genome-wide association study of rice yield per plant in 1439 indica hybrid F1 cultivars in Hangzhou and Sanya with 1098527 SNP markers.QTN detection modules of C++and R softwares were used to analyze rice yield per plant in Hangzhou and Sanya and its average phenotype.The results showed that the positions and numbers of QTNs in the above three datasets using R software are consistent with those using C++software,being 32,23,and 17 QTNs,respectively,although the estimates for their effects differ at the 1e-6 level.QEI detection modules of C++and R softwares were used to jointly analyze rice yield per plant and grain number per plant in above two environments.The results showed that 25 and 48 QTNs,and 20 and 25 QEIs for the two traits are identified using R software,respectively,whereas 25 and 48 QTNs,and 20 and 22 QEIs for the two traits are identified using C++software,respectively.Main-effect QTNs using R software are the same as those using C++software,although the estimates for their effects differ at the 1e-6 level.The number of QEIs for the second trait using R software is three more than that using C++software,and the others are the same.These differences are mainly derived from different implementations of quasi-Newton optimization algorithm used between the two softwares in single-marker scanning,resulting in slight differences in final results.R(10 threads)and C++(8,16,and 32 threads)softwares were used to calculate kinship matrix and detect QTNs and QEIs for yield per plant.The results showed that it took 927s to calculate kinship matrix using R software,whereas it took 110s,64s,and 38s to calculate kinship matrix under the above three threads,respectively,using C++software.Meanwhile,it took 81.26 minutes to detect main-effect QTNs for yield per plant using R software,whereas it took 47.84,33.24,and 22.31 minutes to detect main-effect QTNs for yield per plant using C++software under the above three threads,respectively.In addition,it took 316 minutes to detect QEIs for yield per plant using R software,whereas it took 189,92,and 69 minutes to detect QEIs for yield per plant using C++software under the above three threads,respectively.This indicates that C++software significantly save running time.In conclusion,QTN and QEI modules for C++software IIIVmr MLM have characteristics of low memory usage,high calculation efficiency,and user-friendly interface,and they can analyze thousands of individuals each with millions of markers on a personal computer.It will provide an important tool for detecting QTNs and QEIs in GWAS. |