Objective:To collect the basic demographic characteristics,clinical features,laboratory tests,and imaging information of acute ischemic stroke patients,and to develop a prediction model for post ischemic stroke epilepsy by machine learning algorithms to better assist clinical.The purpose of this study is to more accurately identify the high-risk group and better assist the doctors to make right decision and treatment.Methods:Patients with acute ischemic stroke who were admitted to the Affiliated Hospital of Qingdao University between January 2018 and December 2019 were included as the derivation cohort to develop the prediction model,while those hospitalized in Qingdao Municipal Hospital between January and December 2019 formed the validation cohort.Thirty-two features such as basic demographic characteristics,clinical features,laboratory tests and imaging information of the patients were retrospectively collected.Based on the Python system,the data from the derivation cohort was randomly divided 70%training set and 30%test set.The training set was used to develop the prediction model and the test set was used for the model verification.For the partial missing and imbalance problem of a training data set,the multiple imputation method is to fill in missing values,and the SMOTE+Tomek Link technique is adopted to balance the dataset.Boruta algorithm is used to filter important feature variables for constructing prediction models.Logistic regression algorithm,naive bayes algorithm,support vector machine algorithm,multilayer perceptron algorithm,adaptive boost algorithm,and gradient boosting decision tree algorithm were used to construct a prediction model of post stroke epilepsy.The predictive performance of the machine learning model requires further appraisal and is compared with the SeLECT score development by traditional statistical regression algorithms.The area under the receiver operating characteristic curve(AUC)was used to evaluate the discriminability of the model.The Brier score was used to evaluate the calibration of the model.Delong test was used to compare the ROC curve area of the optimal ML model with that of SeLECT scores.Other model evaluation indexes include accuracy,sensitivity,and specificity.We adopted the SHAP method for interpreting the ML model constructed using real data from hospitals,aiming to better guide clinical practice.Results:1.Among the 2847 patients admitted with acute ischemic stroke during the study period,133(4.7%)had PSE.Of the 1977 patients were included for analysis in the derivation cohort,87(4.4%)patients were screened positive for PSE.Of the 870 patients were included for analysis in the validation cohort,46(5.3%)of patients had PSE.2.As a result,4 important features were selected by the Boruta method,including NIHSS score,length of stay,D-dimer,and cortical involvement.Among these machine learning algorithms,the NB model showed better performance with an AUC of 0.757,a Brier score of 0.126,an accuracy of 0.722,a sensitivity of 73.9%,and specificity of72.0%.Further comparing the NB model with the SeLECT model,the NB model has a slightly better discrimination than the SeLECT score(NB vs SeLECT,0.757 vs 0.732,Pdelong=0.61).With both the NB model and SeLECT scores achieving the ideal calibration capability,the former has slightly less calibration power than the latter(NB vs SeLECT,0.126 vs 0.048).In terms of clinical effect,although the SeLECT score showed high specificity(97.6%)at absolute risk thresholds,its very low sensitivity(4.3%)will make the model difficult to be applied in clinical practice.Therefore,the NB model was finally considered superior to the SeLECT score after the combined evaluation of the above three aspects.3.The SHAP tool indicated that the 4 features were arranged in order of importance is NIHSS score,length of stay,D-dimer,and cortical involvement.With the increase in NIHSS score,the model tended to predict the outcome that the patient would develop post-stroke epilepsy in the future.Longer hospital stays and higher D-dimer levels put stroke patients at greater risk of developing post stroke epilepsy.In addition,ischemic stroke patients with cortical involvement are also more likely to develop post stroke epilepsy.Conclusion:The study developed and validated a comprehensive model to predict the risk of PSE.The NB algorithm-based prediction model has the best efficacy and is superior to the SeLECT model for the prediction of post-stroke epilepsy.SHAP analysis was used to explain the decision weight and prediction direction of each feature in NB model,designed to better guide clinician decisions. |