Diabetes is known as a health killer.Globally,the number of people with diabetes has been increasing year by year,with an average annual growth rate of 51%.In 2019,our country spent about US $109 billion on medical expenses for treating diabetes,it can be said that diabetes makes many families are under tremendous economic pressure.Diabetes in the early stage without any obvious symptoms,are late complications,then patients really realize the dangers of diabetes,therefore,the prevention of diabetes is the key to the treatment of diabetes.The predictive model of diabetes based on machine learning can be used for preliminary screening and risk assessment of diabetes,which has positive significance for early prevention of diabetes.In this paper,we use three algorithms of logistic regression,random forest and artificial neural network in machine learning,and use Python language to realize the prediction model of diabetes.The main findings are as follows:(1)Date pre-processing: To collect the in patient data of clinical diabetic patients in tertiary hospitals,analyze the risk factors of diabetes with medical knowledge,reduce the dimension of the data,and use the random forest stepwise regression algorithm to screen the features,the selected data are divided into training set,verification set and test set in a ratio of 3:1:1.The training set is used to train the model,verify that the parameters are optimized,and test the performance of the model.(2)Build the model of Diabetes Risk Prediction: Using the three algorithms of logistic regression,random forest and artificial neural network in the machine learning algorithm to build the model of disease risk prediction,to realize the risk judgment of diabetes,the test set is used to verify the effectiveness of the three models,and compared with the prediction results.The model with the best prediction effect is selected as the prediction system algorithm model.The results show that the accuracy of logistic regression in test set was 75%,that of random forest in test set was 80%,and that of artificial neural network in test set was 77%,random forest is better than logic regression and artificial neural network in accuracy rate,recall rate and F1 value of test samples.The false positive rate,or misdiagnosis rate,was 8.7%,lower than that of logistic regression(11.6%)and random forest(11.2%),in the actual clinical guidance also has certain reference significance. |