| Objective:Primary prevention is the principal strategy to relieve the burden of stroke.The integration of traditional Chinese medicine(TCM)and Western medicine has been proven effective in the primary prevention of stroke.Developing a new-onset stroke prediction tool with TCM characteristics might improve current primary stroke prevention.Syndrome elements are groups of symptoms divided under the guidance of TCM theories.Therefore,we aimed to use syndrome elements and traditional risk factors together as predictive variables to develop an appropriate model with TCM characteristics for predicting the risk of new-onset stroke.Methods:Based on the database of populations at high risk of stroke,a retrospective cohort study was performed for those who had not experienced stroke.In this study,the dependent variable was the first stroke event within 10 years of follow-up,and independent variables were candidate predictors,including age,gender,systolic blood pressure,antihypertensive therapy,diabetes,total cholesterol(TC),total glyceride(TG),low density lipoprotein cholesterol(LDL-C),high density lipoprotein cholesterol(HDL-C),carotid atherosclerosis,current smoking,qi-deficiency,yang-deficiency,yin-deficiency,fire,phlegm-dampness,blood-stasis.Using the random split method,the dataset was divided into a training set and a verification set at 7:3.Using oversampling,undersampling,and Random Over-Sampling Examples(ROSE)to balance the positive and negative samples of the training set.Single factor analysis and stepwise regression were performed to select the predictive variables.Logistic regression and XGBoost were used to develop the model.AUC and Brier scores were used to evaluate the model performance.Finally,we constructed a nomogram presenting individual prediction of new-onset stroke.Results:A total of 1783 individuals at high risk of stroke were included(1248 in the training set and 535 in the validation set),including 110 patients who were diagnosed with new-onset stroke in the past decade.As for screening variables,according to the results of single factor analysis and stepwise regression,the logistic regression model finally included eight predictors,including gender,age,systolic blood pressure,diabetes,HDL-C,carotid atherosclerosis,current smoking,and fire;both oversampling and ROSE XGBoost models included seven predictors,including systolic blood pressure,carotid atherosclerosis,fire,HDL-C,age,antihypertensive therapy,and current smoking;in addition to the seven predictors of other XGBoost models,the undersampling XGBoost model also included diabetes and gender.As for the evaluation of model performance,the training set AUC of the ROSE logistic regression model is 0.746(95%CI 0.719-0.774),and the validation set AUC is 0.658(95%CI 0.572-0.745);the training set AUC of the oversampling XGBoost model was 0.836(95%CI 0.821-0.852),and the validation set AUC was 0.644(95%CI 0.553-0.735).Compared with the logistic regression model,the XGBoost model performed better in training set(P<0.001).There was no statistical difference in AUC of validation sets between the two models(P=0.646),but the logistic regression model was more clinically interpretable.The logistic regression prediction model suggested that fire was a significant stroke risk factor(OR=1.93,95%CI 1.50-2.49,P<0.001),other risk factors also included old age,high systolic blood pressure,carotid atherosclerosis,current smoker,diabetes,and women,while high HDL-C level was a protective factor(OR=0.58,95%CI 0.41-0.82,P=0.002).The model formula is Logit(p)=-2.36+0.190*female+0.0214*age+0.0192*systolic blood pressure+0.286*diabetes-0.549*HDL-C+1.13*carotid atherosclerosis+0.604*current smoker+ 0.657*fire.The optimal cut-off value for the nomogram of this prediction model is 180 points.When the score of individuals at high risk of stroke exceeds 180,the probability of new-onset stroke is great.Discussion;(1)Fire(a syndrome element of TCM),old age,high systolic blood pressure,carotid atherosclerosis,current smoker,diabetes,and female might be the risk factors for predicting new-onset stroke,and the high HDL-C level might be protective factors.Our prediction model provides new insights for predicting and preventing new-onset stroke in China.(2)The evolution law of TCM pathogenesis in populations at high-risk of stroke is as follows:as age increases and functional decline,healthy individuals gradually enter a high-risk state of stroke,with qi deficiency as the starting factor;physiological products such as qi,blood,and body fluids transform into pathological products such as qi stagnation,blood stasis,and phlegm dampness;they promote the formation of fire,and then phlegm,blood stasis,and fire change into fire poison,causing a stroke. |