Font Size: a A A

Development And Validation Of Models For The Prediction Of Assisted Reproduction Outcomes Based On Machine Learning Algorithms

Posted on:2024-05-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:W T RaoFull Text:PDF
GTID:1524307319462314Subject:Obstetrics and gynecology
Abstract/Summary:PDF Full Text Request
Objective:This study aims to construct prediction models for clinical outcomes both prior to in vitro fertilization and embryo transfer(IVF-ET)or intracytoplasmic sperm injection(ICSI)treatment and subsequent to the egg retrieval procedure,employing machine learning algorithms,based on a large prospective cohort.The variables utilized in the models will be selected from comprehensive baseline and follow-up data.Additionally,the investigation seeks to identify genes that are correlated with recurrent implantation failure(RIF)following treatment and to build a predictive classification model employing said genes.Methods:This study involved patient couples who underwent IVF/ICSI treatment for infertility at the Reproductive Centre of Tongji Hospital,Tongji Medical College,Huazhong University of Science and Technology and were enlisted in the Tongji Reproductive and Environmental Cohort between January 2019 and January 2020.Their baseline,therapeutic,and embryo data were collected and followed up until the end of the ongoing cycle.By the allocation of patients to the training,test,and validation sets by a ratio of 8:1:1,predictive models were devised for clinical pregnancy outcomes in the current cycle prior to treatment,using eight machine learning algorithms.These models were then internally verified via tenfold cross-validation.To evaluate the performance and reliability of the prediction models,several metrics were used,including accuracy,precision,recall,F1 score,Brier score,probability calibration curve,confusion matrix,and the area under curve(AUC)of the receiver operating characteristic(ROC)curve.Based on the results,the most optimal algorithms were selected to construct prediction models for clinical pregnancy and live birth outcomes post egg retrieval procedure.The optimal model was then externally validated using data from patients newly enrolled in the cohort from April 2021 to April 2022.Furthermore,RIF-related datasets from the GEO database were analyzed to identify differentially expressed genes.Subsequently,a machine learning algorithm was employed to build a predictive classification model that could differentiate the RIF group from controls,and a competitive endogenous RNA(ceRNA)network was constructed accordingly.Results:A total of 1321 patient couples experiencing infertility were enrolled for training the prediction model,while 340 patient couples were enrolled as an external validation set.To predict clinical pregnancy prior to treatment,34 feature variables were initially selected from the original 379 baseline feature variables,following extreme gradient boosting.The random forest model without feature selection showed superior performance with a mean accuracy of 77.6%and mean AUC of 0.756 in ten-fold cross-validation and an accuracy of 79.1%and AUC of 0.690 in external validation.Regarding clinical pregnancy and live birth prediction after egg retrieval,17 feature variables and 59 feature variables were selected for model construction from the original 424 feature variables,following extreme gradient boosting.The random forest model without feature selection showed the best performance for predicting clinical pregnancy after egg retrieval with a mean accuracy of 83.1%and mean AUC of 0.891 in ten-fold cross-validation.Meanwhile,the recursive feature elimination-random forest model with 41 variables had the best performance in predicting live birth after egg retrieval,with a mean accuracy of 80.2%and mean AUC of 0.865 in tenfold cross-validation.In external validation,the most accurate prediction models had accuracies of 82.4%and 80.6%,and AUCs of 0.838 and 0.865,respectively.From the RIFrelated datasets in the GEO database,six differentially expressed mRNAs were screened:GAS1,PAPSS2,BIRC3,UNC5B,SLC4A7,and PTGS2,along with nine lncRNAs and six pseudogenes.Using the support vector machine algorithm,a predictive classification model was constructed based on the six mRNAs,exhibiting accuracies of 100%,75%,100%,and 100%,with AUCs of 1.000,0.889,1.000,and 1.000,respectively,in the four test sets.On the external validation set,the model demonstrated an accuracy of 83.3%with an AUC of 0.857.Based on these differentially expressed genes,a ceRNA regulatory network was constructed,where each gene in the network was differentially expressed bewteen the RIF group and control group.Conclusions:The random forest algorithm demonstrated superior performance in predicting clinical outcomes both prior to IVF/ICSI treatment and post egg retrieval procedure,with 17-41 variables included.Moreover,six genes showing differential expression may be implicated in RIF post-treatment and can be employed to predict adverse outcomes following treatment.Part Ⅰ:Development and validation of clinical pregnancy prediction models prior to the initiation of assisted reproductive treatmentObjective:The use of machine learning in the field of assisted reproduction has shown promising results in predicting outcomes such as fertilization probability,embryo selection,and live birth probability after IVF/ICSI.However,few machine learning models assessed clinical pregnancy success rates prior to treatment initiation.To address this gap,we conducted a study to develop a machine learning model that utilizes rich baseline clinical characteristics to predict clinical pregnancy success rates before the start of IVF/ICSI treatment.By providing more accurate and reliable predictions for assisted reproduction treatment,this model has the potential to reduce the financial burden and physical and mental stress of patients.Methods:This study involved infertile couples who received IVF/ICSI treatment at the Reproductive Center of Tongji Hospital affiliated to Tongji Medical College,Huazhong University of Science and Technology.The patients were enrolled in the Tongji Reproductive and Environmental Cohort between January 2019 and January 2020,and their baseline data,treatment data,and embryo data were collected and followed up until the end of the current cycle.To construct the prediction models for clinical pregnancy outcomes in the current cycle,patients were randomly assigned to the training set,test set,and validation set in an 8:1:1 ratio.Multiple machine learning algorithms were utilized to construct the prediction models using ten-fold cross-validation.Specifically,eight commonly used machine learning models were selected,including the generalized linear model,naive Bayesian model,decision tree model,K-nearest neighbor model,support vector machine model,random forest model,extreme gradient boosting model,and deep neural network model.To evaluate the performance and reliability of the prediction models,several metrics were used,including accuracy,precision,recall,F1 score,Brier score,probability calibration curve,confusion matrix,and the area under curve(AUC)of the receiver operating characteristic(ROC)curve.The optimal model was then externally validated using data from patients newly enrolled in the cohort from April 2021 to April 2022.Results:The study included a total of 1321 couples with infertility,of whom 947(71.7%)achieved clinical pregnancies in the current cycle.Following the screening of 379 baseline characteristic variables using the extreme gradient boosting technique,34 characteristic variables were initially selected for inclusion in the subsequent model construction.Among the various machine learning models evaluated,the generalized linear model incorporating seven feature variables demonstrated excellent interpretability.However,the random forest model without feature selection exhibited the best performance,with an average accuracy of 77.6%and an average AUC of 0.756 for ten-fold cross-validation,and an accuracy of 79.1%and AUC of 0.690 in external validation.Conclusions:The random forest model without feature selection is effective in predicting clinical pregnancy rates prior to IVF/ICSI treatment initiation.Part Ⅱ:Development and validation of prediction models for clinical pregnancy and live birth after egg retrieval procedureObjective:This study focused on constructing machine learning models for predicting the probability of clinical pregnancy and live birth in the current cycle following the IVF/ICSI egg retrieval procedure,with additional treatment-related characteristic variables and intermediate outcome variables included in the model construction process.Methods:Patients from the population included in the first part of the study were randomly assigned to training,test,and validation sets in an 8:1:1 ratio,and ten-fold cross-validation was used to evaluate model performance.Building on the findings of the first part,machine learning models with good interpretability,high performance,and good classification in large feature spaces were selected for further analysis.Specifically,the generalized linear model,random forest model,and deep neural network model were chosen.To further enhance the prediction models,recursive feature elimination and hill climbing algorithm were applied to screen and select the most informative feature variables for inclusion in the models.The optimal model was then externally validated using data from patients newly enrolled in the cohort from April 2021 to April 2022.Results:Through a comprehensive screening process of baseline data,treatment data,and embryo data,17 variables and 59 variables were selected as input for constructing the prediction models for clinical pregnancy and live birth probability in the current cycle.Using ten-fold cross-validation,the performance of these models was assessed.The results demonstrated that the random forest model without feature selection was the best model for predicting clinical pregnancy,achieving a mean accuracy of 83.1%and a mean AUC of 0.891.Meanwhile,the recursive feature elimination-random forest model was the best model for predicting live birth,with a mean accuracy of 80.2%and a mean AUC of 0.865.In external validation,the most accurate prediction models had accuracies of 82.4%and 80.6%,and AUCs of 0.838 and 0.865,respectively.Conclusions:The random forest model is effective in predicting clinical pregnancy and live birth rate after IVF/ICSI egg retrieval procedure.Part Ⅲ:Development and validation of predictive classification models for recurrent implantation failure and construction of competitive endogenous RNA networkObjective:In light of the fact that some patients with a high predicted probability in the previous study ultimately failed the treatment,it is possible that other unknown factors contributed to the treatment outcome.This study seeks to identify genes from the GEO database that may be linked to RIF following IVF/ICSI treatment.The study then aims to build a machine learning predictive classification model that can differentiate between normal and RIF populations,as well as construct a RIF-associated ceRNA network.Methods:Common differentially expressed genes were screened and mRNAs,lncRNAs and pseudogenes were classified using GeneCards,after RIF-related microarray datasets were retrieved from the GEO database and analyzed.Machine learning models were constructed using support vector machine algorithm with differentially expressed mRNAs to distinguish the control and RIF groups,and were validated using test sets and external independent validation set.Predictions for interactions with miRNAs were made using miRWalk and DIANA-LncBase v3 based on these differentially expressed mRNAs and lncRNAs,and lncRNA-miRNA-mRNA ceRNA regulatory networks were constructed.Results:Six common differentially expressed mRNAs(GAS1,PAPSS2,BIRC3,UNC5B,SLC4A7,and PTGS2),along with nine lncRNAs and six pseudogenes,were screened in this study.Using the support vector machine algorithm,a predictive classification model was constructed based on the six mRNAs,exhibiting accuracies of 100%,75%,100%,and 100%,with AUCs of 1.000,0.889,1.000,and 1.000,respectively,in the four test sets.On the external validation set,the model demonstrated an accuracy of 83.3%with an AUC of 0.857.A lncRNA-miRNA-mRNA ceRNA regulatory network was constructed based on these differentially expressed genes,and differential expression of each gene in the network was observed between the RIF and control groups.Conclusions:Six genes,namely GAS1,PAPSS2,BIRC3,UNC5B,SLC4A7 and PTGS2,have been found to be differentially expressed and potentially associated with poor outcomes after IVF/ICSI treatment.The classification model base on these genes can differenciate RIF group from control group.Additionally,the ceRNA networks may offer novel insights into the pathogenesis and management of RIF.
Keywords/Search Tags:Assisted reproductive technology, In vitro fertilization, Intracytoplasmic sperm injection, Recurrent implantation failure, Machine learning, Prediction model
PDF Full Text Request
Related items