| With the rapid development of the economy,people’s lives have changed a lot,and the changes in dietary structure have caused many negative effects.Diabetes is a chronic disease closely related to people’s living habits,and its incidence is increasing year by year.The diagnosis of diabetes is mainly based on the clinical experience of the doctor and the physical examination data of the patient.The early manifestation of diabetes in blood glucose is not obvious,and it is difficult to confirm the diagnosis.With the development of science and technology,medical data continues to increase,and the application of artificial intelligence in the medical field is an inevitable trend.The diagnosis of diabetes often requires multiple physical examinations,which not only increases the patient’s physical examination pain but also delays treatment time.Based on the existing medical data,combined with artificial intelligence to assist doctors in disease diagnosis,it can reduce the probability of missed diagnosis and misdiagnosis and reduce the number of physical examinations for patients.Use artificial intelligence technology to enhance disease diagnosis.The data in this article comes from the artificial intelligence-assisted prediction of genetic risk of diabetes in the Tianchi Precision Medical Competition.In order to establish various types of diabetes auxiliary diagnosis systems,the data should cover different populations as much as possible.The data in this article are mainly divided into two types,one is the physical examination data of the general population,and the other is the physical examination data of the special population(pregnant women).Because the diagnostic criteria for diabetes in the general population are different from the diagnostic criteria for gestational diabetes in pregnant women,the two types of data are modeled and predicted separately.The label data in the general medical examination data is "blood glucose",and the prediction of the general medical examination data is a regression problem;the label data in the pregnant medical examination data is "whether or not you have gestational diabetes mellitus",and the prediction of the pregnant medical examination data belongs to the classification problem.The main research contents of this paper include:(1)selection of algorithm models,analysis of raw data and review of a large number of prediction papers on medical conditions,and finally selected three models are XGBoost,LightGBM,and CatBoost.The optimization of modeling is to make the single model achieve the best results,and then the models are fused,and the fused models can achieve better results.(2)Data preprocessing.Because there are many missing values in the original data,the missing values need to be processed.For data with too many missing values,this feature is directly deleted to avoid disturbing the model training results.(3)Data feature processing.By analyzing the relationship between the information of each field and the label value,the importance of the feature is obtained,and the important features are cross-referenced to obtain better features,which helps improve the accuracy of the model prediction.Remove the less relevant features to avoid interference with the model results.(4)Modeling and analysis,using the features processed in(3)to build XGBoost,LightGBM,and CatBoost models,adjusting the model through genetic algorithms,and finally obtaining the best single model,and then stacking multiple models Perform fusion and adjust the parameters to get a better fusion model.(5)The design and implementation of the system.Through the data processing,feature selection,model building and tuning of the previous chapters,a better model is finally obtained,and then the model is applied to the system.Through the analysis of requirements,system architecture design,database design,and front-end design,a comprehensive diabetes diagnosis system was obtained. |