Objective: At present,the incidence of ischemic stroke in China is increasing every year,and there is a high risk of recurrence and vascular death after the first ischemic stroke.By collecting,analyzing,organizing,and calculating hospitalization medical records of clinically diagnosed ischemic stroke patients and their relapsed patients,predicting factors that affect ischemic stroke recurrence(ISR),and using data mining technology to establish a predictive model for ischemic stroke,it is beneficial to guide clinical doctors in formulating secondary prevention plans for ISR in the early stage.Methods :(1)Collect and summarize medical history data of 300 patients diagnosed with ischemic stroke and its recurrence,such as imaging examination results,diagnosis and treatment plans,discharge diagnosis,etc.,and establish a database.(2)Establish table fields to collect clinical data,and then conduct preliminary statistical analysis on the collected medical record data through stroke recurrence to further identify the main influencing factors.(3)Based on the new representative indicators formed by principal component analysis,a data mining classification algorithm is used to establish a recurrence prediction model for ischemic stroke rehabilitation,including logistic regression analysis model,support vector machine model,artificial neural network model,and decision tree model.Evaluate the performance of each model based on accuracy,sensitivity,specificity,positive predictive value,negative predictive value,and area under the ROC curve.(4)Finally,based on patient medical record data,the best performance of each model is selected for predictive analysis to predict the influencing factors of ischemic stroke recurrence,in order to guide clinical doctors and patients to do a good job in secondary prevention of ischemic stroke.And guide doctors to develop the best diagnosis and treatment plan and comprehensive rehabilitation plan,providing a basis for decision-making.And in clinical practice,the accuracy and performance of the model have been further improved.Results:(1)This study found that the recurrence of ischemic stroke is related to the interval between attacks,hospitalization department,rehabilitation days,hospitalization frequency,medical payment method,marriage,occupation,case classification,whether to transfer from ICU,whether to use a ventilator,whether to use antibiotics,abnormal lactate dehydrogenase,thyroid function,previous stroke,previous hepatitis B,smoking history,smoking time,concurrent secondary epilepsy,and concurrent urinary tract infection NHISS score,location of the first stroke,type of the first stroke,location of the lesion,number of lesions,thalamus,corona radiata,brain stem,leukoaraiosis,atheromatous plaque formation,stenosis degree of common carotid artery,stenosis degree of internal carotid artery,stenosis degree of vertebral artery,stenosis degree of middle cerebral artery,stenosis degree of posterior cerebral artery,stenosis degree of basilar artery,stenosis degree of posterior communicating artery,lower limb vein thrombosis Tricuspid valve regurgitation,mitral regurgitation,pulmonary hypertension,aortic valve regurgitation,left ventricular enlargement,interventricular septum hypertrophy,right ventricular hypertrophy,whether EEG is normal,comprehensive training of hemiplegic limbs,nerve facilitation technology,hyperbaric oxygen treatment,medium frequency electrotherapy,ordinary acupuncture+electroacupuncture,special treatment of pharynx,and transcranial magnetic stimulation have statistical significance(P<0.05).(2)This study used principal component analysis technology to condense 183 independent variables into57 new variables,which represent the majority of information and are independent of each other.(3)The accuracy,sensitivity,specificity,positive predictive value,and negative predictive value of the logistic regression model obtained in this study were93.3%,94.24%,92.54%,91.61%,and 94.90%,respectively.Perform ROC curve analysis and calculate the area under the curve to be 0.934.The accuracy,sensitivity,specificity,positive predictive value,and negative predictive value of classification and regression tree models for ischemic stroke recurrence were 93.46%,94.20%,92.86%,91.55%,and 95.12%,respectively.After analyzing the ROC curve,the area under the ROC curve was calculated to be 0.956.The accuracy,sensitivity,specificity,positive predictive value,and negative predictive value of the fast unbiased and effective statistical tree model were 90.2%,93.75%,87.64%,84.51%,and 95.12%,respectively.The area under the ROC curve was 0.925.The accuracy,sensitivity,specificity,positive predictive value,and negative predictive value of the chi square automatic interaction test model were 88.89%,92.19%,86.52%,83.10%,and 93.90%,respectively.The calculated area under the ROC curve is 0.938.The accuracy,sensitivity,specificity,positive predictive value,and negative predictive value of the artificial neural network model for ischemic stroke recurrence were 94.12%,93.06%,95.06%,94.37%,and 93.90%,respectively.After analyzing the ROC curve,the area under the ROC curve was calculated to be 0.926.The accuracy,sensitivity,specificity,positive predictive value,and negative predictive value of the support vector machine model for ischemic stroke recurrence were 85.62%,85.51%,85.71%,83.10%,and87.80%,respectively.Perform ROC curve analysis and calculate the area under the ROC curve to be 0.926.Conclusion:(1)Through this study,it was found that the occurrence of ISR and the interval between two attacks,the severity of the disease,the comprehensive rehabilitation treatment and the rehabilitation time,the efficacy of hospitalization,the presence of cardiac organic diseases,the deterioration of important arteries supplying the brain,the location and number of lesions,complications,the aggravating factors of arteriosclerosis,the aggravating factors of venous thrombosis,economic and living environment factors,and abnormal serological indicators Previous illness or chronic diseases such as hepatitis B.(2)Through principal component analysis technology,it is possible to transform complex independent variables in the influencing factors of ISR into effective variables that can reflect most of the information of the original variable while ensuring that the information contained is not duplicated.(3)On the basis of principal component analysis,six prediction models were successfully constructed,and their performance was relatively ideal.Among them,the classification and regression tree model had the best comprehensive performance,while the logistic regression analysis prediction model was good.The two models can be combined,and the Neural Net model is a relatively good model. |