| Genomic selection(GS)based on high-density genetic markers across the entire genome,proposed at the beginning of the 21st century,has higher accuracy in predicting complex traits,traits controlled by multiple genes,or traits that are difficult to observe and record directly than traditional selection methods.Since its proposal,scholars have made significant contributions to the research and application of genomic selection methods and models,proposing many different algorithms and statistical models.However,current genomic selection methods are based on SNP data analysis and lack computational models for multi-omics joint analysis,which cannot fully utilize multi-level molecular information.Therefore,this paper first explored whether integrating multi-omics data to predict phe-notypes has advantages compared to using only genomic data to predict phenotypes through public datasets of fruit flies and mice.The strategy of using GO ontology to pre-select multi-omics feature markers for phenotype prediction was proposed,which generally improved the prediction accuracy of different traits in various populations.Finally,a phenotype prediction model combining multi-omics data and prior biological knowledge was established based on deep learning.The specific contents and results of this paper are as follows:(1)The effect of multi-omics data on phenotype prediction accuracyThis paper explores whether multi-omics data from various experimental populations have predictive value using Drosophila and mouse datasets from public databases.Multi-omics phenotype prediction models have advantages over genomic phenotype prediction mod-els.In genomic phenotype prediction,gradient boosting algorithms and random forest meth-ods improved the accuracy of genomic prediction in Drosophila populations by an average of29.43%and 10.37%compared to GBLUP,respectively.Among the six trait-sex combinations in the Drosophila population,Bayes B had the highest average prediction accuracy.Com-bining multi-omics phenotype prediction models of genomics and transcriptomics,GTBLUP significantly improved the accuracy of phenotype prediction compared to GBLUP,especially in Drosophila starvation resistance,where GTBLUP improved accuracy by 33.59%(females)and 31.06%(males),with advantages over all genomic prediction models.Transcriptomic data from liver and hippocampus tissues improved the accuracy of mouse weight trait prediction by140.8%380.8%and 85.3%158.4%,respectively,with a greater improvement when using tran-scriptomic data from liver tissue for mouse data.(2)Integration of GO functional feature selection strategyThis paper uses gene ontology to classify multi-omics molecular markers by biologi-cal function and analyzes the impact of the features included in each biological function on phenotype prediction accuracy.Integrating gene ontology into phenotype prediction mod-els,predicted five out of six trait-sex combinations in the Drosophila population with higher accuracy than.The improvement ofoverranged from 45.2%to 540.6%in these combinations.In the mouse population,using transcriptomic data from liver tissue,improved the accuracy by 106.8%to 579.7%compared to,and by 19.6%to 96.4%compared to.By adding function-related genes collected from literature and GO terms related to traits analyzed in this paper as features in the prediction model,it was found that important function-related genes and GO terms can both reduce model residual variance.At the same time,by analyzing the gradient of the number of selected GO terms,a turning point in prediction accuracy was found when selecting the”TOP10”GO terms.(3)Phenotype prediction method based on neural networksThis paper constructed a phenotype prediction model by integrating multiple omics data and prior biological knowledge based on neural networks.The model reduces the number of ineffective connections in the network by utilizing gene annotation information and biolog-ical function annotation information.Applied to real Drosophila and mice population data,the model improved the prediction accuracy for starvation resistance in female Drosophila by61.07%and 20.57%compared to the GBLUP and GTBLUP methods,respectively,and by45.39%and 10.94%in male Drosophila.For the startle response trait in Drosophila,the model improved the average prediction accuracy on both male and female flies by 11.64%compared to the GBLUP method;For 6-10 week body weight traits in mice,the prediction accuracy of this model was improved by 7.1%over that of the GTBLUP(Liver)method.In addition,important genes such as CG13229,CG13758,Takr99,and CG31343 that affect Drosophila starvation resistance were identified.In summary,this paper provides reference analysis results and feature pre-selection strate-gies for integrating multiple omics data for phenotype prediction,and provides new methods for phenotype prediction of multiple omics data. |