| In recent years,with the improvement of people’s living standards and the change of people’s lifestyles,diabetes as a chronic disease its morbidity is increasing,having become one of the major diseases affecting residents’ life and health.However,prediction model for estimating the risk of diabetes in China is relatively lacking at present.Therefore,it is of great practical significance to establish diabetes risk prediction model that suitable for our population,which can contribute to clinical decision and help identify high-risk groups.This paper reviewed relevant researches about diabetes prediction at home and abroad and summarized predecessors’ relevant studies.Then related physical examination datas of a three-A grade hospital in Yunnan Province from July 2016 to June 2017 were collected.And variables included in the study included sex,age,body mass index,family history of diabetes,triglycerides,total cholesterol,C-reactive protein and so on,a total of 13 variables.The collected sample data were grouped separately in a randomized method,about 70% of the samples(4064 cases)were assigned to the total set of training,and about 30% of the samples(1629 cases)were assigned to test set.For the neural network,in order to prevent the occurrence of over-fitting problem,the total set of training were then randomly divided into training set(2845 cases)and test set(1219 cases)in accordance with the ratio of 7: 3.And the diabetes prediction model was established based on logistic regression,BP neural network,BP neural network after screening variables by logistic regression respectively.Then we evaluated the prediction performance of different models by calculating sensitivity,specificity,accuracy and the area under ROC curve.Finally,the generalization ability and stability of different models were evaluated by the method of adjusting the proportion of test set.Multivariate logistic regression analysis showed that among the 13 variables studied,age(P = 0.005,OR = 1.225),family history of diabetes(P = 0.013,OR =1.628),body mass index(P = 0.003,OR = 2.066),triglyceride(P = 0.000,OR =1.146),high-density lipoprotein cholesterol(P = 0.000,OR = 0.550),low-density lipoprotein cholesterol(P = 0.007,OR = 0.861)and C-reactive protein(P = 0.000,OR= 1.007)were significantly related to the incidence of type 2 diabetes mellitus.Theaccuracy of logistic regression model was 85.0% and the area under the ROC curve was 0.764(95% CI: 0.749 ~ 0.780).The predictor’s importance of BP neural network ranked in the top five were: body mass index(100.0%),triglyceride(36.2%),high-density lipoprotein cholesterol(31.5%),low-density lipoprotein cholesterol(31.1%),total cholesterol(27.8%).And the predicted results [accuracy = 89.6%,AUC(95% CI)= 0.826(0.816 ~ 0.835)] were superior to the logistic regression model.Experimental results showed that BP neural network after screening variables by logistic regression performed the best,with the prediction accuracy was 92.1% and the area under the ROC curve was 0.846(95% CI: 0.837 ~ 0.855).The area under the ROC curve of the three models under 10%-50% different test set showed little change,indicating that the model had good stability.And the area under the ROC curve of BP neural network after screening variables by logistic regression was the largest,followed by BP neural network,and the area under the ROC curve of logistic regression model was the smallest.Therefore,BP neural network model has better predictive performance in predicting the risk of type 2 diabetes mellitus,while the logistic regression model has a better ability to interpret variables than BP neural networks.So we can combine these two models in practical applications. |