An Interpretable Ensemble Framework Based On Tree Models For Forecasting The Cases Of COVID-19 In The United States

Posted on:2024-09-01

Degree:Master

Type:Thesis

Country:China

Candidate:H L Zheng

Full Text:PDF

GTID:2544307088477864

Subject:Public health

Abstract/Summary:

PDF Full Text Request

Objective: This prevalence of coronavirus disease 2019(COVID-19)was once one of the most serious public health crises in the world.The United States was one of the countries most seriously affected by the epidemic.The study designed the interpretable integration framework based on tree models to predict the number of novel coronavirus infections and to analyze its important influencing factors,which was of great significance to the formulation of epidemic related prevention and control measures.The integrated framework can be widely used in the prediction and impact factor analysis of other similar infectious diseases due to its advantages of high efficiency,fast speed and strong interpretability,and it could provide certain scientific support for the management and prevention and control of this kind of epidemic.Methods: The interpretable ensemble framework based on machine learning was designed to forecast daily new cases of COVID-19 in the United States and to determine the important factors related to COVID-19.The framework was mainly divided into four layers.The first layer was data preparation,including the number of cases and four types of characteristics(self-protection,social prevention and control,community mobility and time index).The secend layer was to establish three machine learning models based on decision tree,namely random forest(RF),e Xtreme Gradient Lifting(XGBoost)and Light Gradient Boosting Machine(Light GBM),and use Hyperopt to optimize parameters.The third layer was to establish three linear ensemble models,including simple average(SA),ordinary least square(OLS),least absolute deviation(LAD)to integrate these outcomes for better prediction accuracy.The fourth layer was to use the SHapley Additive explanation(SHAP)value to get the feature importance ranking for explaining the machine learning models and linear integration models.Root mean square error(RMSE),mean absolute error(MAE)and mean absolute percentage error(MAPE)were selected to evaluate the fitting and prediction performance of all models.Results: Our outcomes demonstrated among three basic machine learning models in the secend layer of the integration framework,the prediction accuracy was in descending order: LightGBM,XGBoost,and RF.The three integration models in the third layer perform better than the basic model at the data inflection point.The optimized LAD ensemble was the most precise prediction model that reduced the MAE of the best base learner(LightGBM)by approximately 3.111%,and its MAPE value is as low as 6.088%.Meanwhile,indicators of vaccination,wearing masks,mobility,and social prevention and control,ranked high in the importance ranking in accordance with the SHAP values.Conclusions: In the interpretability integration framework of this study,the optimized LAD ensemble had the best prediction accuracy.Indicators of vaccination,wearing masks,less mobility,and social prevention and control had a positive effect on the control and prevention of COVID-19.

Keywords/Search Tags:

COVID-19, LightGBM, ensemble framework, interpretability, disease prediction

PDF Full Text Request

Related items

1	Research On Hyperlipidemia Risk Prediction Based On Ensemble Algorithm LightGBM
2	Prediction Of Disease Indices Based On Ensemble Learning
3	Application Research Of Fusion Model Based On Ensemble Learning In Blood Glucose Prediction
4	Prediction Of Medical Service Waiting-time Based On Ensemble Learning Algorithm
5	Study On Prediction Of The Risk Of Severe COVID-19 Based On SARS-CoV-2 Evolutionary Analysis
6	MRI-based Glioma Segmentation And Interpretability Method
7	A Research On Microbial Data Modeling And Disease Prediction Based On Ensemble Deep Learning
8	Research On Sepsis Early Prediction Algorithms And Its Interpretability Based On ICU Clinical Data
9	Research And Application Of Ensemble Learning Algorithms For Health Big Data-based Diagnosis And Treatment Problems
10	Research And Implementation Of Cardiovascular Disease Prediction System For The Elderly Based On Big Data Framework