Font Size: a A A

An Interpretable Ensemble Framework Based On Tree Models For Forecasting The Cases Of COVID-19 In The United States

Posted on:2024-09-01Degree:MasterType:Thesis
Country:ChinaCandidate:H L ZhengFull Text:PDF
GTID:2544307088477864Subject:Public health
Abstract/Summary:PDF Full Text Request
Objective: This prevalence of coronavirus disease 2019(COVID-19)was once one of the most serious public health crises in the world.The United States was one of the countries most seriously affected by the epidemic.The study designed the interpretable integration framework based on tree models to predict the number of novel coronavirus infections and to analyze its important influencing factors,which was of great significance to the formulation of epidemic related prevention and control measures.The integrated framework can be widely used in the prediction and impact factor analysis of other similar infectious diseases due to its advantages of high efficiency,fast speed and strong interpretability,and it could provide certain scientific support for the management and prevention and control of this kind of epidemic.Methods: The interpretable ensemble framework based on machine learning was designed to forecast daily new cases of COVID-19 in the United States and to determine the important factors related to COVID-19.The framework was mainly divided into four layers.The first layer was data preparation,including the number of cases and four types of characteristics(self-protection,social prevention and control,community mobility and time index).The secend layer was to establish three machine learning models based on decision tree,namely random forest(RF),e Xtreme Gradient Lifting(XGBoost)and Light Gradient Boosting Machine(Light GBM),and use Hyperopt to optimize parameters.The third layer was to establish three linear ensemble models,including simple average(SA),ordinary least square(OLS),least absolute deviation(LAD)to integrate these outcomes for better prediction accuracy.The fourth layer was to use the SHapley Additive explanation(SHAP)value to get the feature importance ranking for explaining the machine learning models and linear integration models.Root mean square error(RMSE),mean absolute error(MAE)and mean absolute percentage error(MAPE)were selected to evaluate the fitting and prediction performance of all models.Results: Our outcomes demonstrated among three basic machine learning models in the secend layer of the integration framework,the prediction accuracy was in descending order: LightGBM,XGBoost,and RF.The three integration models in the third layer perform better than the basic model at the data inflection point.The optimized LAD ensemble was the most precise prediction model that reduced the MAE of the best base learner(LightGBM)by approximately 3.111%,and its MAPE value is as low as 6.088%.Meanwhile,indicators of vaccination,wearing masks,mobility,and social prevention and control,ranked high in the importance ranking in accordance with the SHAP values.Conclusions: In the interpretability integration framework of this study,the optimized LAD ensemble had the best prediction accuracy.Indicators of vaccination,wearing masks,less mobility,and social prevention and control had a positive effect on the control and prevention of COVID-19.
Keywords/Search Tags:COVID-19, LightGBM, ensemble framework, interpretability, disease prediction
PDF Full Text Request
Related items