Font Size: a A A

Comparative Study Of Machine Learning Method Based On DS Screening Big Data And Optimization Of Risk Assessment Scheme For DS Screening

Posted on:2021-03-25Degree:MasterType:Thesis
Country:ChinaCandidate:X N HuFull Text:PDF
GTID:2404330623477514Subject:Cell biology
Abstract/Summary:PDF Full Text Request
Objective:To retrospectively analyze the the efficacy of the first and second trimester screening for Down syndrome in the local population.To indigenize the median of Down syndrome(DS)screening markers and optimize the risk assessment of DS screening based on statistical analysis of medical big data.This study analyzes the significance of optimizing the DS screening risk assessment program by converting the DS screening risk into absolute risk(AR).The risk assessment model of DS screening was established by using machine learning method,and the application value of machine learning method in DS screening was analyzed.Methods:A total of 16,340 pregnant women who underwent the first-trimester screening at the Center ofprenatal diagnosis,the First hospital of Jilin university,from December 12,2012 to October 31,2017 were collected.100138 pregnant women who underwent the second-trimester screening from July 8,2010 to November 13,2017 were collected.14,316 and 99,851 cases were included in the first-trimester combined screening and second-trimester triple screening,respectively.The database of second-trimester triple screening was used to build DS screening model of machine learning method because it needs a big database and cases of the first-trimester combined screening was too smaller to build machine learning method model.In addition,in the statistical analysis of the screening efficiency of the DS screening and absolute risk screening,the pregnant women with lost follow-up and incomplete information were further excluded.Finally,the number of pregnant women in the first and second trimester screeningwas 13,702 and 80577,respectively.At the same time,when establishing the database of the median ofindigenized screening markers,other factors that may affect the screening markers should be excluded.The final number of pregnant women in the first and second trimester screening was 13,521 and 55,686,respectively.Maternal serum marker levels were measured using time-resolved fluoroimmunoassay,andfetal nuchal translucency(NT)was measured by ultrasound.The experimental platforms for machine learning method modeling are Windows 7 professional 64 bit,Python 3.6.3,pandas 0.20.3 and scikit-learn 0.19.1.The AR value is calculated by dividing the combined risk by the maternal age risk.The fitting models of NT median and gestational age were quadratic model,logarithmic-quadratic model and log-sigmoid model.The fitting models of the second-trimester screening serum markers' medianand gestational age(GA)were quadratic model,logarithmic-quadratic model,linear model and logarithmic-linear model,while the fitting models of Mo M median and weight were quadratic model,logarithmic-quadratic model,logarithmic-linear model and reciprocal linear model.Results:1.In the first-trimester combined screening,there were 14316 single pregnancy pregnant women,with an average age of 29.11 ± 2.96 years,199 pregnant women(1.39%)had an advanced maternal age(AMA),and average weight was 58.95 ± 9.83 kg.In the second-trimester triple screening,there were 99851 single pregnancy pregnant women,with an average age of 27.76 ± 4.03 years,3743 pregnant women(3.75%)had an advanced maternal age,and average weight was 60.68 ± 10.28 kg.2.13703 pregnant women with effective follow-up and complete information in the first-trimester screening,230 of them with high risk,and 3 fetuses were diagnosed as DS.The detection rate(DR)was 100%,the false positive rate(FPR)was 1.66%,the positive predictive value(PPV)was 1.30%,and the negative predictive value(NPV)was 100%.There were 80,577 pregnant women with effective follow-up and complete fetal information in second-trimester screening,4191 pregnant women with high risk and 20 fetuses were diagnosed as DS.The DR was 64.52%,FPR was 5.18%,PPV was 0.48%,NPV was 99.99%.The DR of the first-trimester screening was significantly higher than that in the second-trimester screening,and the FPR was significantly lower(p<0.01).3.When the AR cut-off value was 3,the DR of first-trimester combined screening and second-trimester screening was 100% and 61.29%,and the false positive rate was 1.77% and 5.98%,respectively.There was no statistical difference compared with the original screening results(p>0.05).However,the false positive rate decreased significantly with the increase of AR cut-off value.4.The results of the three models of NT median fitted with CRL are very well,the corrected R~2 values are greater than 0.95,and the log-sigmoid model is the best.The median value of NT after indigenization fluctuated in the range of 1.0 ± 0.02,which was very stable and ideal.The paired t-test showed a significant difference(t = 7.353,p<0.001)when compared with the built-in NT median.However,both data distributions of the NT Mo M and log10(NT Mo M)deviated from the normal distribution.5.The regression model correction R~2 of AFP median fitted with GA is greater than 0.992,and the log-quadratic model is the best.The regression model correction R~2 of Free ?-h CG median fitted with GA is greater than 0.98,and the log quadratic model is the best.The regression model correction R~2 of u E3 median fitted with GA is greater than 0.995,and the quadratic model is the best.The regression model correction R~2 of AFP Mo M median fitted with maternal weight is above 0.99,and the reciprocal model is the best.The log-quadratic model is the best regression model of Free ?-h CG Mo M fitted with maternal weight.The reciprocal-linear model is the best regression model of u E3 Mo M median fitted with maternal weight.The results of paired t-test showed that there was a significant difference between the indigenized and built-in groups(p<0.05).6.The results of DS screening risk calculation using indigenized Delta NT showed that the DR was 100% and FPR was 3.6% when the risk cut-off value was 1/270.In addition,when the cut value was 1/262,the DR was 100%,and FPR was only 3.31%.7.The results of second-trimester screening risk calculation using the indigenized data base showed that the DR was 80.65% and FPR was 10.22% when the cut-off value was 1/270.DR increased by 19.45% and FPR increased by 3.05%.When the FPR was the same,the DR of indigenized data screening was 77.42%,16.13% higher than that of built-in parameters.8.Classification regression tree(CART)algorithm combined with adaptive enhancement(Ada Boost)algorithm and synthetic minority oversampling Technology(SMOTE)-Tomek algorithm can improve the DR of DS screening to more than 95%.The DR of SVM is higher than CART.The DR is 100% and FPR is only 1.83% when SVM combined with SMOTE-Tomek.Conclusions:1.First-trimester combined screening and second-trimester triple screening can effectively avoid the birth of DS child and reduce the incidence of birth defects.The efficiency of First-trimester combined screening is higher than that of second-trimester triple screening.2.Absolute risk value can be used as a reference for risk assessment of DS screening,which can reduce the false positive rate of screening to a certain extent.3.Indigenized screening markers can effectively improve the DR and reduce the FPR.Method of Mo M should not be used to standardize the indigenized NT.4.Machine learning method can significantly improve the DR of screening,reduce the FPR,and improve the DS screening efficiency.
Keywords/Search Tags:Down syndrome, prenatal screening, absolute risk, machine learning method, detection rate
PDF Full Text Request
Related items