Font Size: a A A

Prediction Of Gestational Diabetes Based On Machine Learning

Posted on:2020-09-06Degree:MasterType:Thesis
Country:ChinaCandidate:J L WangFull Text:PDF
GTID:2404330590471023Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the rise of living standards,the public awareness of preventing diseases in health is also gradually improved.Especially during pregnancy,the risk of gestational diabetes has been predicted in advance and has become a major demand of millions of households.The rise of the intelligent health industry is to meet this new demand,so that pregnant women can learn as soon as possible about their possible conditions,so as to prevent and control as soon as possible.And because of the rapid development of the Internet,a lot of health data has been stored in computers,such as diagnostic records,electronic records and medical records.With the development of science and artificial intelligence,machine learning becomes more and more mature,and its classification ability becomes more and more powerful and intelligent,providing a new method for predicting gestational diabetes.Based on the theory of machine learning and classifier and the use of medical data from tianchi big data,this paper aims to establish a gestational diabetes mellitus model with the best prediction effect,in the hope that through this model,high-risk pregnant women can be identified as early as possible and effective intervention can be carried out in time,so as to make efforts to improve the quality of life and health of pregnant women.In this paper,data preprocessing including data outliers detection and processing.In the detection of outliers,the box graph method is mainly used,and the mean value is used to replace the outliers.For missing values,the variables with a missing rate higher than 0.5 are deleted,and then in the remaining variables,continuous variables are filled with multiple interpolation method,while discrete variables are filled with fixed value-999.In this way,it is better to fill with missing values than to uniformly fill the model without considering the types of missing values.After data preprocessing,this paper uses the method of IV to screen features.For the 83 features in the original sample data,40 important features were selected through feature engineering,which were composed of 17 original features including VAR00007,SNP34 and SNP37 and their combinations.This shows that for the prediction of gestational diabetes,we can only collect 17 feature data,without having to blindly collect too much data,which can save the time cost of collecting data and facilitate early prediction.At the same time,these important feature also indicate the direction for the prevention and control of gestational diabetes.In the process of constructing the predictive model,this paper establishes the Logistics regression model,Lasso-Logistics,GBDT,Xgboost,Lightgbm,Catboost and integrate multiple learners.After comparison,it is found that the model with multiple learners is the best in terms of model stability and generalization ability.The AUC values on the training set and test set are as high as 0.7889 and 0.7986 respectively.This suggests that Xgboost,Lightgbm and Catboost integrated learners have a good effect on predicting gestational diabetes.Most of the papers have used the Logistics model to analyze the risk factors affecting gestational diabetes,and rarely analyze whether pregnant women will suffer from gestational diabetes.Moreover,most of the analysis is based on Logistics model and rarely uses machine learning model to predict gestational diabetes.This paper makes some useful attempts in the above research perspectives and models,and the research ideas and methods in this paper are also of certain reference value for the prediction and early warning of other diseases.
Keywords/Search Tags:Gestational diabetes, Machine learning, Classification prediction, Ensemble learning
PDF Full Text Request
Related items