| BackgroundIn the era of precision medicine,the construction of a more accurate and efficient prognostic model will be helpful to better guide clinical treatment and decision-making for improving the survival of patients with cancer.However,due to the highly heterogeneous characteristics of tumors,the performance of traditional prognostic model is still poor,which is based on common clinical indicators,such as age and pathological stage or single omics data.With the development of high-throughput sequencing technology,the acquisition of tumor multi-omics data has become more convenient.Actually,the integration of different omics will effectively improve the predictive ability of the model and contribute to further understanding the mechanism of tumor occurrence and development.However,the integration of multi-omics also brings a series of challenges in statistical modeling.One of the key issues is how to integrate multiple omics effectively.A reasonable idea is to integrate data based on the biological relationship among omics,which not only improves the predictive performance but also enhances the interpretability of the model.However,due to the existence of pleiotropy,specific associations among different omics are difficult to determine previously.It is a suitable solution that synthetically describes the relationship among different omics by constructing respective latent variables.Previous studies have tried to build an integrative predictive model based on two omics in this way,but there are still limitations,such as only adapting to the integration of two omics and without considering Weibull survival distribution.ObjectiveIn this study,based on the Bayesian structural equation model framework,we proposed a new multi-omics integrated prognostic model,i.e.,the Bayesian structural equation model for survival(BSEMsurv).Through simulation and case studies,the statistical performance of the proposed BSEMsurv in integrating multi-omics data is fully investigated,which provides new ideas and methodological guidance for the multi-omics integration modeling theory.Contents and methods1.Theoretical model construction:combining the Bayesian structural equation and survival regression theory,the novel integrative prognostic model is established.Each omic is indicated by the corresponding special latent variable.Then the structural equation will be made for associating different latent variables.After that,the structural equation is further embedded into the parametric survival regression model,i.e.,the accelerated failure time model(AFT).In the Bayesian framework,the corresponding regression coefficient and variance are given weak information prior.The process of model iteration and posterior sample sampling is carried out based on No-U-turn(NUTS)algorithm.2.Simulation study:Based on the parametric survival regression model,simulation data under different conditions are generated,including Lognormal and Weibull survival distribution,different sample size(n=200/500),different number of omics(q1=q2=q3=5/10),and different censored rates(0%,20%,50%,80%).In addition,the influence of intra-omics correlation(i.e.,residual correlation)on the model is further discussed.The model performance of BSEMsurv is demonstrated by comparing others(e.g.,simple-integrated model,iBAG,BGLR,blockForest,DCAP,and IPF-LASSO).At the same time,considering that different prior distributions(inverse-Gamma,half-Cauchy,student’s)under the Bayesian framework may have a certain influence on the model,we further apply the prior sensitivity analysis to explore it.3.Case study:Based on multi-omics data from The Cancer Genome Atlas(TCGA),we choose gastric cancer and lung adenocarcinoma as two validation sets for evaluating the actual effectiveness of our BSEMsurv model.Overall survival(OS)is used as the outcome event,and clinical covariates(i.e.,age,sex,and pathological stage)and three omics data types(i.e.,gene expression mRNA,microRNA,and DNA methylation)are included for modeling.Results1.The results of simulation study1.1 Performance evaluation of BSEMsurv under Lognormal distributionThe constructed BSEMsurv shows low estimation deviations under different censored scenarios(0%,20%,50%,80%).Although the deviations of several parameters increase under the high censored rate,they are still within reasonable ranges,with MSEs<0.15.Increasing the number of omics and changing the sample size does not significantly increase the estimation bias.In terms of model fitting,BSEMsurv has better goodness-of-fit than the simple-integrated model or iBAG.In terms of prediction performance,BSEMsurv has the lowest prediction error compared with others(simple-integrated and iBAG)under different censored conditions.BSEMsurv and simple-integrated model have similar performance in C-index but are better than iBAG.As for C-index,BSEMsurv is similar to IPF-LASSO and simple-integrated model but superior to others such as BGLR,and iBAG.In addition,compared with the original BSEMsurv,the BSEMsurv with residual correlation shows no absolute advantage in prediction performance though improving the estimation accuracy of corresponding variances.Prior sensitivity analysis shows that different priors have no significant influence on the parameter estimation or prediction performance.1.2 Performance evaluation of BSEMsurv under Weibull distributionUnder different censored rates(0%,20%,50%,80%),BSEMsurv also shows low estimation deviations.With the increase of censored rate,the deviations of several parameters increase but are still within reasonable ranges,with MSEs<0.1.By changing the number of omics and sample size,there is no significant increase in the estimation deviations.In terms of model fitting,BSEMsurv still has better goodness-of-fit compared with the simple-integrated model or iBAG.Also,BSEMsurv has lower prediction error than others(simple-integrated and iBAG)under different censored conditions.The C-index of BSEMsurv is slightly less than IPF-LASSO and simple-integrated model under highly censored condition but still better than others.In addition,under the Weibull distribution,the BSEMsurv with residual correlation has improved the estimation accuracy of corresponding variances but does not show significant advantages in prediction performance.Prior sensitivity analysis suggests that different priors under the Weibull condition also have no significant influence on the parameter estimation and prediction performance.2.The results of case study.Bioinformatics preprocessing and LASSO variable selection were applied for reducing the high-dimensional feature of omics.The final complete dataset of gastric cancer is as follows:total sample size is N=269;the censored rate is 59.11%(event number is 110);6 mRNAs,6 microRNAs,and 6 methylation sites are screened out.The final complete dataset of lung adenocarcinoma is as follows:total sample size is N=419;the censored rate is 63.96%(event number is 151);10 mRNAs,11 microRNAs,and 9 methylation sites are screened out.The goodness-of-fit test of the survival distribution indicated that the assumption of the Weibull distribution is satisfied(P>0.05).Therefore,the Weibull-based integrative AFT model was adopted for follow-up modeling.The results are shown as follows:1)In the gastric cancer model,age,pathological stage,and latent variable mRNA have significant effects on survival time,which the coefficients and 95%credible intervals are-0.3187(95%CI:[-0.496,-0.149])、-0.3764(95%CI:[-0.591,-0.177])、-0.9848(95%CI:[1.369,-0.644]).Compared with the simple-integrated model,iBAG and others,BSEMsurv achieves the best performance in model fitting and prediction(LOOIC=1052.7,WAIC=1050.6,MSE=1.7774,RMSE=1.3332,MAE=1.1387,C-index=0.7630).2)In the lung adenocarcinoma model,age,pathological stage,and latent variable mRNA also have significant effects on survival time,which the coefficients and 95%credible intervals are0.1316(95%CI:[-0.261,-0.003])、-0.3345(95%CI:[-0.470,-0.202])、-0.8391(95%CI:[-1.078,-0.620]).Compared with the simple-integrated model,iBAG and others,BSEMsurv also achieves better performance in model fitting and prediction(LOOIC=1550.0,WAIC=1548.0,MSE=1.8813,RMSE=1.3716,MAE=1.1150,C-index=0.7527).ConclusionThe purpose of this study is to build a new multi-omics integrative prognostic model,which provides a new idea for tumor modeling.With the unified framework of the Bayesian structural equation model,our model can effectively integrate different omics by utilizing the biological relationships among omics.Based on the parametric survival regression,the statistical performance of BSEMsurv is fully discussed under the hypothesis of Lognormal and Weibull survival distribution.Through systematic simulation studies and TCGA case studies(gastric cancer and lung adenocarcinoma),we demonstrate that our integrative model has certain advantages in parameter estimation and prediction performance.In the future,our model can be further extended to non-parametric survival regression and combing with dimensionality reduction strategies. |