| ObjectiveThis study aimed to analysis epidemic status and influencing factors of hypertension among residents aged 18~69 years old through the survey of chronic disease and its risk factors in Shenzhen in 2015;to establish a risk assessment model for hypertension based on Harvard Cancer Risk Index;to construct the risk prediction model of hypertension based on machine learning algorithm.It not only could provide scientific evidence and assessment instrument for the prevention and control of hypertension in china,but also provides a new idea for risk assessment of chronic diseases.MethodsUsing the multi-stage random cluster sampling method,10 communities were first randomly selected from each administrative district in Shenzhen,and 130 households were randomly selected from each community,then 1 resident aged 18~69 years old was selected.Demographic characteristic,behavioral life style,physiological and biochemical indicators were interviewed.A total of 10058 people were included in the analysis.For the measurement data,mean±standard deviation was used,and using Student’s t test、rank sum test to compare two groups;for the categorical data,frequency and proportion were used to describe the distribution characteristic of hypertension,chi-square and Fisher’s exact probability methods were conducted to compare the differences of hypertension prevalence in different groups,and the trend chi-square was used for the trend analysis.And using Logistic Regressionto investigate the determinant influencing factors of hypertension;Harvard Cancer Risk Index was carried out to establish the risk assessment model for hypertension.And Using machine learning algorithms to construct the risk prediction model of hypertension,including Logistic Regression,Random Forest,Support Vector Machine.This study mainly used Spss25.0 and Python3.6 for data analysis,including descriptive analysis and multivariable analysis.Using Python 3.6for machine learning to build prediction model.The Receiver Operating Characteristic(ROC)curve was performed with MedCalcl18.2.1.Results1.Basic characteristics of study population.A total of 10058 residents aged 18~69 years old in Shenzhen were investigated in this study,with a mean age of 43.58±12.00 years.Among them,4112(40.88%)were male,with a mean age of 43.13±11.81;5946(59.12%)were female,with a mean age of 43.90±12.12.The majority(97.55%)residents were Han nationality and education level is mainly high school,secondary school,technical school or below(73.63%),marriage status is mainly in marriage(89.55%),and most people have medical insurance(90.51%).2.The epidemic status of hypertension among residents in Shenzhen.the crude prevalence of hypertension among residents aged 18~69 years old was 22.72%,27.09% for male and 19.69% for female,respectively,with significant difference(χ2 = 75.768,P < 0.001).The standardized prevalence of hypertension was 20.07%,24.72% for male and 16.81% for female,respectively.The trend chi-square test showed that the prevalence of hypertension increased with age(Z= 7.718,P < 0.001),and decreased with education(Z=-3.927,P < 0.001).There was a significant difference in the prevalence of hypertension in different marital status(χ2= 145.725,P <0.001),the highest prevalence of hypertension was in the widowed population(50.00%).And occupation,smoking,drinking,salty taste,salty taste,sleeping time,BMI,waist circumference,TG,HDL-C and diabetes were all related to hypertension(P < 0.05).3.The risk assessment model of hypertension was established based on Harvard Cancer Risk Index.The influencing factors of multivariate Logistic Regression analysis included gender,age,education,drinking,BMI,waist circumference,high TG,low HDL-C and diabetes.The risk assessment model of hypertension constructed by Harvard Cancer Risk Index showed that the average risk score of the population was 34.06,and the number of hypertension patients increased with the increase of risk level(Z=8.600,P<0.001).The evaluation results of predictive performance of the model showed that AUC was 0.768(95%CI:0.749~0.786),and the best cut-off point is the ratio R of 0.954,which could be used as the best positive critical point for predicting hypertension in individuals,at this time,the value of Youden’s index was the largest which was 0.43,and the Sensitivity and Specificity of the diagnostic test were 80.2% and 62.4%respectively.4.The risk prediction model was constructed based on machine learning.Feature selection process was carried out by using information gain ranking criteria,and the features included in machine learning as follows: age,BMI,waist circumference,diabetes,education,HDL-C,TG,gender and other eight variables.Compared with no sampling,the performance of prediction model built by three machine learning algorithms were both improved in AUC,F1-score and sensitivity by using SMOTE(Synthetic Minority Over-sampling Technique).The AUC of hypertension risk prediction model established by Logistic Regression,Random Forest and SVM(Support Vector Machine)were0.776(95%CI:0.757~0.794),0.774(95%CI:0.755~0.792)and 0.778(95%CI:0.759~0.796),respectively.Among the models,SVM got the highest Sensitivity of 0.77,and the F1-score were both 0.51 which were not so well.And the ROC curve analysis showed that there was no significant difference(P>0.05)between the two of the three machine learning algorithms tested by De Long method,indicating that the prediction performances of the three machine learning algorithms wereequivalent.In terms of the best cut-off point(threshold)which could be considered as the positive critical point for predicting hypertension in individuals,The best cut-off point of Logistic Regression was0.4546,Youden’s index was 0.43,Sensitivity was 79.74% and Specificity was 62.91%;The best cut-off point of random forest model was 0.4470,Youden’s index was 0.43,the Sensitivity was 82.35% and Specificity was60.01%;The best cut-off point of SVM was 0.4949,Youden’s index was0.43,Sensitivity was 76.47% and specificity 66.13%.Conclusions1.The prevalence of hypertension among residents aged 18~69 years old in Shenzhen in 2015 was lower than the national level,and there were significant differences in demographic characteristics and behavioral life styles distribution.2.The main influencing factors of hypertension include: age,BMI,waist circumference,diabetes,education,HDL-C,TG,gender and drinking.3.The risk assessment model of hypertension established by Harvard Cancer Risk Index has medium performance,can effectively realize the quantification and stratification of disease,and can be used as an evaluation tool for hypertension risk assessment.Of course,the application of this model needs to be further verified and improved.4.The risk prediction models of hypertension based on machine learning algorithms both have medium performances and can provide evaluation tools for control and prevention of hypertension in Community Healthcare Service center.However,the application of the models to practice needs external verification. |