| In recent years,with the improvement of people’s living standards,the food has become more and more diverse,and the number of people suffering from diabetes has also increased.Diabetes has gradually become one of the main factors affecting human health,and gestational diabetes mellitus is a special one.Pregnant women suffering from gestational diabetes mellitus have a great impact on the fetus and themselves,even life-threatening,so early screening for gestational diabetes mellitus is very important.At present,the development of machine learning is becoming more and more mature,and many fields will combine machine learning models to solve related problems,and some have achieved remarkable results.Therefore,the machine learning model can be applied to the prediction of gestational diabetes mellitus for early screening,so as to achieve early detection and early treatment.In this paper,on the gestational diabetes mellitus data set provided by the Tianchi Precision Medicine Competition jointly organized by Alibaba Cloud and Qingwutong Gene,the artificial intelligence-assisted diabetes genetic risk prediction,by constructing different machine learning models and conducting analysis and evaluation,the best model is obtained.The specific research contents are as follows:(1)Selection algorithm.After reading relevant information about gestational diabetes mellitus and combining the characteristics of the data set,four algorithms including logistic regression,support vector machine,XGBoost and Cat Boost are selected for modeling after preliminary exploration.(2)Preprocessing of experimental data.There are many missing values and outliers in the original data set,and the magnitude of data varies greatly.Therefore,"Null"is used to fill the vacant values for discrete features in the original data.For the vacancy value of continuous feature,the mean value is used to fill.Median substitution is used for outliers of continuous features.After that,the discrete variables are one-hot coded,and the continuous variables are normalized.(3)Select the features of the prediction model.There are many features in the dataset,and there are redundant fields.These redundant features will bring computational burden.Therefore,this paper uses the SVM-REFCV algorithm for feature selection after data preprocessing,and filters out 47 features and substitutes them into the subsequent machine learning model.(4)Modeling and evaluation.After data preprocessing and feature selection,the data set is divided into training set and test set in the ratio of 8:2.The four algorithms in(1)are used for modeling respectively,and the combination of grid search and cross validation is used for parameter tuning.After the optimal parameters are determined,the indicators of the models are analyzed.Then this paper proposes Voting,Blending and Stacking,and fuse the first four models in different ways.In this paper,f1 value and AUC value are used to evaluate all models.The experimental results show that the stacking method with logistic regression,Cat Boost,SVM and XGBoost as the primary learner and Cat Boost as the secondary learner performs better than other models.Therefore,the Stacking method has the best effect in predicting the risk of gestational diabetes mellitus. |