Objectives In recent years,the healthcare costs in China have increased rapidly.The rate of patients’ readmission have remained high and the cost of hospitalization per capita is high.An analysis of medical expenses reveals that a small number of patients can explain most of the medical expenses.Chronic hepatitis B disease is a major health problem in China.The treatment of hepatitis B is a long-term process that can lead to various complications and high medical costs.Constructing a predictive model that can be used to identify high-risk patients helps patients to understand their health status in advance.Early treatment and intervention are adopted to reduce the possible progress of the disease and economic pressure,and to allocate medical resources more reasonably.Therefore,this study uses the clinical treatment cohort data to build accurate and personalized predictive models of the admission risk and high direct medical costs for patients with chronic hepatitis B and cirrhosis in the next 12 months.We focus on the early stages of patient’s disease development,strengthening intervention and treatment to prevent the patient’s condition from worsening.We provide data evidence to rationally optimize the allocation of medical resources and control the cost of the disease.Methods This study was a retrospective cohort study.Based on the information system of a specialist hospital for infectious diseases in Guangzhou,the relevant medical information for patients diagnosed with "hepatitis B" and "cirrhosis" from 2011 to 2017 was collected.According to the principle of random grouping,the data set is divided into a training set(70%)and a validation set(30%).Aiming at the class imbalance problem,the SMOTE algorithm was used to equalize the original training set to obtain the class balanced training set.The next year’s admission and the next year’s high direct medical expenses(the first 5% of the total medical expenses)were used as the outcome variables to analyze,and the demographic characteristics,biochemical examination,and treatment plans were used as independent variables to establish the model.A multivariate logistic stepwise regression and random forest variable importance screening method were used to construct a clinical prediction model.The optimal prediction model was comprehensively selected through six prediction model evaluation indicators: sensitivity,specificity,F1-measure,G-mean,AUC,and calibration curve,and verified with the data of the validation set(30%).Finally,a simple and practical Nomogram is established for the optimal prediction model.Results1.This study included 27,736 patients with chronic hepatitis B medical insurance,of which 400(1.44%)were admission the following year,and1565(5.64%)had high direct medical expenses the following year.The annual direct medical expenses per capita is 5768.89 RMB,and the high direct medical expenses threshold is 14,994.56 RMB.Of the 7,022 patients with cirrhosis,602(8.57%)were admission the following year,and 179(2.55%)had high direct medical expenses the following year.The average annual direct medical expenses for patients is 10,331.48 RMB,and the high direct medical expenses threshold is 32,529.69 RMB.2.The prediction model constructed by the class imbalance data set has low sensitivity and high specificity.Some prediction models have a specificity of 88.5%,a sensitivity of only 65.3%,and its F1-measure basically does not exceed 30%,while some models have only 5.39%.G-mean is generally about 70% and the AUC value is basically between0.7-0.8,and the calibration curve is densely distributed below the diagonal or deviates from the diagonal.In contrast,the performance of the prediction model obtained by the same analysis method for the class balanced data set after the SMOTE algorithm has improved significantly.Although the specificity is lower than the prediction model based on the class imbalance training set,the sensitivity is improved.F1-Measure and G-mean have basically reached 80% and the AUC has increased,and the calibration curve is relatively closer to the diagonal.3.The next year admission model for chronic hepatitis B included age,hospitalization,lamivudine,entecavir,telbivudine,and DNA variables.The AUC of the training and validation sets were 0.846 and 0.852,respectively.The model of high direct medical expenses for the next year of chronic hepatitis B included the hospitalization,indirect bilirubin,interferon,hepatoprotective drugs,total bilirubin,and alanine aminotransferase variables.The AUC of the training and validation sets were 0.847 and 0.671,respectively.The next year admission model for cirrhosis included the hospitalization,total protein,alanine aminotransferase,albumin,aspartate aminotransferase,and age variables.The AUC of the training and validation sets were 0.944 and 0.787,respectively.The model of cirrhosis of the next year with high direct medical costs included total protein,entecavir,hospitalization,age,aspartate aminotransferase,and albumin variables.The AUC of the training and validation sets were 0.963 and 0.857,respectively.Conclusions1.The class balance data set based on the SMOTE algorithm performs better than the original data set.The recognition rate of the model for positive and negative samples tends to be balanced.2.This study constructs four predictive model from 29 variables:demographic characteristics,biochemical examination,drug use,and treatment plans.These predictive models have good performance and have certain application value. |