Font Size: a A A

Comparison And Application Of The Lasso-based Methods For Constructing Disease Risk Prediction Model By Integrating Clinical And Omics Characteristics

Posted on:2021-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:H F ZhangFull Text:PDF
GTID:2404330623975541Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Objective:This study aims to explore and compare the properties of four lasso-based methods that can be used to integrate clinical and omics characteristics to construct disease risk prediction models through simulation and case study,so as to provide suggestions for the establishment of disease risk prediction models in clinical practice.Methods:First,we introduced the principle of Naive-LASSO method,Separate-LASSO method,IPF-LASSO method and Priority-LASSO method.Second,we simulated six kinds of clinical and omics data with different sparsity and intensity,and changed the correlation within and between the clinical and omics data by changing the covariance matrix.The following three correlation relationships were considered in this study:(1)variables were independent both within and between clinical and omics data.The covariance matrix was set to the identity matrix;(2)variables within the clinical or omics data showed a compound symmetric correlation,while there was no correlation between the two data.The covariance matrix was set to the block diagonal matrix;(3)variables were correlated both within and between clinical and omics data.The covariance matrix was more complicated.Through the above operations,a total of 18 simulation scenarios were formed.Then,the above four methods were used to build prediction models for each simulation scenario,the performance of these methods in different simulation scenarios was compared by AUC,Brier score and the number of selected predictors.Last,the clinical and lncRNA expression data of diffuse large B-cell lymphoma(DLBCL)patients were analyzed by the four methods to establish a new DLBCL prognostic evaluation system.The performance of each method in the real data was evaluated according to the evaluation criteria of simulation study.Results:The results of simulation study showed that no matter in which simulation scenario,IPF-LASSO method and Priority-LASSO method had the best performance among the four methods,while the Separate-LASSO method selected the most variables.And,the prediction accuracy of IPF-LASSO method was better than that of Priority-LASSO method,with higher AUC and lower Brier score.Moreover,the prediction accuracy of the four methods in the data satisfying the latter two correlation relationships was higher than that in the completely independent data.However,regardless of the correlation relationship of the clinical and omics data,when the sparsity and variable intensity of the clinical data were constant,the sparsity and variable intensity of the omics data had little influence on the prediction performance of the four methods,but when the sparsity and variable intensity of the omics data were constant,the sparsity and variable intensity of the clinical data had a great impact on the prediction performance of the four methods.The results of the case study were consistent with the simulation results,the DLBCL prognosis models established by IPF-LASSO method and Priority-LASSO method had higher prediction accuracy and fewer predictors,and there were seven identical predictors in the two models.Conclusion:In conclusion,in terms of the Naive-LASSO method,Separate-LASSO method,IPF-LASSO method and Priority-LASSO method,IPF-LASSO method and Priority-LASSO method select fewer variables,and the disease risk prediction models constructed by them usually have higher accuracy and clinical practical value.Therefore,when combining clinical and omics data to establish disease risk prediction model,these two methods can be given priority.
Keywords/Search Tags:LASSO, clinical data, omics data, disease risk prediction model, diffuse large B-cell lymphoma
PDF Full Text Request
Related items