Font Size: a A A

Factors Affecting Time To Pregnancy And Predict Models Of Fecundity Of Rural Women In Henan Province

Posted on:2018-10-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q ZhangFull Text:PDF
GTID:1314330518967974Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
BackgroundEpidemiological surveys have shown that fertility in women of childbearing age has been declining in recent years,and the research of fertility has become one of the hotspots in the study of female reproductive health.In general,fertility assessments are been divided into indirect evaluation indicators and direct evaluation indicators.Biological markers such as hormone levels and semen quality can be used to independently evaluate fertility,but these are indirect indicators.The only direct evaluation index is the waiting time(Waiting Time to Pregnancy,TTP),that is,the time of couples began the absence of contraception and regular sexual life to reach the time of pregnancy.At present,studies of fertility prediction model at home and abroad are mainly through the analysis of semen quality,frequency of sexual intercourse and other indicators.There were little studies using data mining methods to study the social demography characteristics,physiological and psychological analysis of large data analysis of fertility prediction.With the rapid development of information technology,data volume has expanded rapidly;data become lagere and much more complex.In order to use the large data effectively,we must use the data mining methods,such as machine learning methods,Data mining technology in the social sciences and natural science research has become increasingly mature,but still not very popular in the field of reproductive health researches.Therefore,it is an important research direction in reproductive epidemiology to predict fertility.ObjectiveThis study is to analyze the epidemiological risk factors of women's pregnancy waiting time in rural areas,and to explore the fertility-classification-prediction-model based on data mining technology,and compares the prediction ability among the models.MethodsData came from the national pre-pregnancy health check project in Henan Province,2014.The inclusion criteria included 1)women aged 15-49 years old;2)women who are not currently pregnant and prepared to get pregnancy within six months.After the baseline survey,the researchers carried out one-year follow-up before getting pregnancy and one-year pregnancy outcome follow-up after pregnancy,all the information were enterd through the electronic data acquisition system into the central database.This study excluded the women who self-reported infertility and who were not prepared to prepare the pregnancy.The ultimately study population were 568850 cases.In the first part of the study(TTP risk factors analysis),data were first cleaned and pretreated,followed by a basic statistical description,which classified data using frequency and percentage of statistical description.The quantitative data were analyzed by means and standed errors.Normal distribution data were analyzed by T test or variance.The nonunormal distribution data was analyzed by Wilcoxon Z test,and the frequency distribution was used to describe the distribution of quantitative variables.For the evaluation of TTP,the median time of conception,the cumulative conception probability,and the cumulative fertility probability curve based on survival analysis and Kaplan-Meier method was used.For the risk factors of TTP,this study mainly used the Cox proportional hazards regression model and calculated the fecundability ratio and 95%confidence interval.At the same time,for the quantitative variables such as menarche age,restrictive cubic spline regression model was used to analyze the relationship between quantitative exposure factors and strain variables by drawing a spline regression curve.In the second part of the study(classification model study),data were first been cleaned,missing value was imputed.Stepwise regression and collinear diagnosis was used to screen the factors.Logistic regression model,decision tree(CART tree)model and random forest model was used to establish the classification prediction model respectively,and the optimal model was selected by cross validation method.The accuracy of the model,and the area under the ROC curves as well as the prediction ability of the model were compared.150,000 participants of the 2012-2013 data were randomly selected as a test dataset to study the generalization of the models.Results1.Analysis of epidemiological risk factors of time to pregnancyThe results showed that the low age group,the lower education level and the peasant population have shorter TTP and higher cumulative pregnancy probability.When analyzing the ontraceptive conditions,menstruation and birth history,the results showed that women who have ever taken contraceptive,menarche age higher than 14 years old,menstrual period shorter than 5 days or longer than 6 days,menstrual cycle longer than 29 days,menstrual flow too less or too more,self-reported dysmenorrhea population higher than the reference group.For lifestyle factors,the results showed that non-smokers,non often passive or occasional passive smoking,non-drinkers,BMI between 18.5-24.9 had a significantly higher cumulative risk than other exposure groups.FRs of people with greater working pressure and economic stress was significantly lower than that of other exposure groups.2.Fertility classification models studyFirstly,variables were screened and irrelevant variables were excluded.The logistic regression model,the decision tree(CART)model and the random forest model were modeled on 80%,70%and 60%samples training set respectively.The area under the ROC curve(AUC)of the logistic regression model was 0.69392,0.69347,0.69453,the AUC of the CART tree model was 0.70009,0.69831 and 0.69839,respectively.The AUC of the random forest model was 0.75384,0.75251 and 0.75068 respectively.We used 80%of the samples as the training set and 20%of the samples as test sets to compare the different models.The ROC curve of logistic regression model,CART tree model and random forest model were plotted.The ROC curve of the random forest model was closer to the upper left corner and is completely outside the ROC curve of the Logistic regression model and the CART model,indicating that the classification ability of the random forest model was better than that of the Logistic regression model and the CART model.ConclusionsIn this study,it can be seen that the risk factors of TTP are very complicated,including social demographic characteristics,menstruation,birth history,lifestyle and social psychological pressure etc.After adjusted for the other covariates,these relationships still exists.In addition,by using the data mining,we can find that the random forest algorithm is better than the traditional logistic regression model and the CART tree model.The application of this algorithm in fertility prediction will have good practical application value.
Keywords/Search Tags:Waiting time to pregnancy, fertility, risk factor analysis, prediction model
PDF Full Text Request
Related items