| The complications of hypertension are a major problem that endangers the public health of our country.The causes of these complications are complex,difficult to cure and high cost of treatment.The basic way to effectively prevent the complications of hypertension is to identify the high-risk population and implement the intervention as soon as possible.Therefore,it is of great social significance to study how to identify the high-risk individuals of hypertension complications.This paper attempts to use machine learning technology to build a stroke risk prediction model to realize the early identification of high-risk individuals.The main contents of this paper are as follows:(1)theoretical research on disease prediction model.Machine learning is an important method in disease prediction.It can learn a lot of data and realize the prediction of disease risk.We choose the classical machine learning algorithm logistic regression and the integrated learning algorithm which is widely used in recent years as the algorithm of establishing stroke risk prediction model.(2)data preprocessing and feature selection.The physical examination data of the first people’s Hospital of Q city in Yunnan Province were collected,preprocessed,and then the characteristics closely related to stroke were preliminarily selected.Then,stepwise regression analysis and variable test were used to obtain two groups of characteristic indexes for follow-up modeling: single obesity index BMI and other indexes,combined with obesity index BMI,WHR and other indexes.(3)six stroke indexes were established Risk prediction model.Using the two groups of characteristic indexes,respectively,based on logistic regression,random forest,XGBoost algorithm to establish stroke risk prediction model,using training set training model,using verification set to determine the network structure and parameters,finally,through the theoretical analysis,establishment process and prediction performance comparison of the six models,the optimal stroke risk prediction model based on the research data set is obtained Type.The conclusions are as follows:(1)compared with single obesity index BMI,using combined obesity index BMI and WHR as obesity index of stroke risk prediction model can improve the prediction ability of the model.Compared with the logistic model established by single obesity index and other indexes,the predictive accuracy of the logistic model established by combining obesity index and other indexes increased by 3.53%;the predictive accuracy of the random forest model established increased by 4.23%;the predictive accuracy of the XGBoost model established increased by 7.75%.(2)XGBoost stroke risk prediction model based on BMI,WHR and other indexes has the best prediction performance.The parameter selection of the model: n_estimators is 92,learning_rate is 0.01,max_depth is 7,gamma is 0.3158,subsample is 1.At this time,AUC of the model is 0.8122,accuracy is 0.8169,sensitivity is 0.7945,and specification is 0.8406.The innovation of this paper:(1)combined with the statistical analysis of the characteristic indexes in the data set,find out the characteristic indexes closely related to the stroke,and creatively use the combined obesity indexes BMI and WHR as the obesity indexes of the stroke risk prediction model.(2)The most novel machine learning algorithm is used to build the risk prediction model of stroke,which achieves better prediction effect.(3)The model parameters are optimized in more detail,and the rules between model optimization and model performance are mined. |