Font Size: a A A

Joint Model Prediction Of Medical Data Based On XGBoost

Posted on:2024-07-02Degree:MasterType:Thesis
Country:ChinaCandidate:L Y ZhengFull Text:PDF
GTID:2544306941977699Subject:Applied statistics
Abstract/Summary:
In biomedical research,analyzing survival or longitudinal data alone may lead to bias in estimation or prediction results,and a combined survival and longitudinal model can effectively solve this problem.Traditional vertical sub models typically use linear mixed effects models,but this model has strict assumptions about data distribution.Therefore,this article considers using machine learning algorithms with excellent predictive performance and wide applicability to predict longitudinal data,and then combining them with survival sub models to establish a joint model.In this paper,we generate survival data and longitudinal data in the case that the Error term in the random effect obeys the normal distribution and does not obey the normal distribution through numerical simulation.Firstly,five different machine learning algorithm models were fitted on the simulated longitudinal data,and the algorithm with the best fitting effect was selected through mean square error.The results showed that the XGBoost algorithm had the best fitting effect;Secondly,from the aspect of Goodness of fit,the independent survival regression model and the joint model were compared,and the results showed that the joint model had better fitting effect;Next,in terms of prediction performance,we compare traditional joint models with joint models that incorporate machine learning algorithms to fit longitudinal variables.The results show that joint models that incorporate machine learning algorithms perform better in prediction performance.Example analysis of data on primary biliary cirrhosis has been conducted,and existing studies mainly analyze the impact of baseline measurements on survival outcomes.Some explorations using joint models are also limited to traditional joint models.This article establishes an XGBoost prediction model for longitudinal indicators that significantly affect survival outcomes for survival influencing factor analysis and individual precision medical analysis.Firstly,the missing data are filled with random forest method;Next,we screened variables,and selected survival variables that had a significant impact on survival outcomes through Kaplan Meier curve combined with proportional risk hypothesis test to be included in the survival sub model.At the same time,we screened longitudinal variables through random forest importance ranking and feature recursive elimination algorithm.Finally,we found that gender,ascites,hepatomegaly,spider veins,albumin and other indicators significantly affected survival outcomes,of which albumin was a longitudinal measurement indicator;Then,XGBoost regression model is fitted for longitudinal variables.In order to avoid overfitting,indicators with greater correlation with longitudinal measurement variables are selected through variance analysis and maximum mutual information coefficient to be included in the longitudinal prediction model.Finally,a joint model is constructed to analyze the data by connecting the survival submodel and the longitudinal submodel through the predicted values of longitudinal variables.The results indicate that the impact of protein indicators on risk rate is negatively correlated,consistent with medical clinical conclusions;Finally,using the XGBoost Joint model results for case analysis,two types of patients with different outcomes were selected for case analysis.The results showed that the predicted results of the model were consistent with the actual situation of the patients.On this basis,targeted diagnosis and treatment suggestions can be proposed for specific cases based on the model results,in order to achieve the effect of delaying disease progression.
Keywords/Search Tags:Joint mode, Machine learning, XGBoost, Predictive performance, case analysis
Related items