Font Size: a A A

Development Of A Machine Learning Based Method To Improve The Genomic Prediction Accuracy And Computation Efficiency For Complex Traits

Posted on:2021-01-31Degree:DoctorType:Dissertation
Country:ChinaCandidate:L L YinFull Text:PDF
GTID:1360330611483070Subject:Animal breeding and genetics and breeding
Abstract/Summary:PDF Full Text Request
Genomic prediction which utilizes the markers across entile genome to predicte unknown phenotype has become a newly emerging techenology.The advances in highthroughput sequencing technologies have reduced the cost of genotyping dramatically and led to genomic prediction being widely promoted and used in animal and plant breeding,and increasingly in polygenic risk score of human diseases.Statistical methods play a crucial role in genomic prediction and can directly affect the predictive effectiveness.A series of BLUP related methods that use relationship matrix have the simpler computation processes,and get the higher computational efficiency,while the prediction accuracy is usually of weakness in certain traits due to its rough model hypothesis.The marker effectbased Bayesian regression methods hold more flexible and reasonable model hypothesis,and can achieve higher prediction accuracy in majority of traits,however,the vast computation burden to obtain all unknown parameters result in lower computational efficiency.Therefore,how to develop an ideal method with features of high accuracy,efficiency and stability has been always a key and difficult issue in the domain of genomic prediction.In this study,we proposed a machine-learning based method entitled with “KAML” to simultaneously improve the prediction accuracy and computation efficiency for complex traits.The whole process of machine-learning was incorporated with crossvalidation,multiple regression,grid search,and bisection algorithms,which can accurately include QTNs(Quantitative Trait Nucleotides)with big effect as covariates and optimize a SNP-weighted trait-specific Kinship as the variance-covariance matrix corresponding to the random effect term of linear mixed model.Total 5 types of models are available and can be switched automatically according to various genetic architecture of different traits.Moreover,the whole optimization procedures can be easily paralleled using all available computation resoureses,making KAML more efficient for prediction.Compared with multiple methods on simulated traits,human diseases and different economic traits of various species,the results showed that:(1)By our designed flexible machine learning strategy,KAML can accurately pick up the SNPs with major effects which could be integrated as covariates and optimize an appropriate marker-weighted genomic relationship matrix,subsequently switch to the optimal model for prediction,which demonstrates the rationality,validity and robustness of the designed strategy.(2)KAML significantly outperforms GBLUP in terms of prediction accuracy and performs similar or slightly better than Bayesian methods.In addition,taking advantage of parallel acceleration,KAML is computationally efficient and roughly hundreds of times faster than Bayesian methods.(3)The pre-estimated model parameters of KAML using parts of individuals can be used directly to predict more larger population,in which case that KAML is as computationally efficient as the regular GBLUP without any significant decrease in prediction performance,making KAML pretty promising in process of big data of breeding programme.(4)Using the marker-weighted genomic relationship matrix derived from KAML,the prediction accuracy of SSGBLUP can be improved for both genotype and nongenotyped indivuduals,which further broadens the application of KAML in the area of genomic selection of livestock and genomic prediction of human diseases.On the strength of improved prediction accuracy and high computation efficiency,KAML will become one of the most important method and tool in the domain of genomic prediction.
Keywords/Search Tags:Genomic prediction/selection(GP/GS), Machine-learning, Accuracy, Efficiency, KAML
PDF Full Text Request
Related items