Font Size: a A A

Study On The Association Of Obesity With Prediabetes And Diabetes Risk And The Construction Of Prediction Model

Posted on:2024-03-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:X Y ZhuFull Text:PDF
GTID:1524307364469134Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Background and Objectives:The onset of Diabetes mellitus(DM)is insidious,with few early symptoms and signs.About half of the patients do not know that they are ill in the early stage of the disease,which may lead to the occurrence of DM complications before a definite diagnosis.DM and its complications pose an increased burden for China,which is a continuing challenge.It is necessary to establish targeted control and prevention strategies that can effectively distinguish the‘high risk’population of DM.Only in this way can be a greater degree of benefits in the prevention and control of DM to reduce the burden of this disease.Prediabetes(PDM)is considered as one of the‘high risk’group for DM.But few studies have explored the characteristics of different subgroups of PDM and its association with different obesity indicators.In addition,the prevalence of DM in obese and overweight people increases year by year,it is not reasonable to treat the obesity problem as‘one size fits all’in the prevention and control of DM.The classification of obesity problem is helpful to identify the‘high risk’obese people in the prevention and control of DM and to allocate the limited resources effectively.Thus,a study to effectively identify the‘high risk’population of DM is carried out.Firstly,the potential different characteristics of PDM subgroups in the Chinese population are investigated comprehensively.The correlations between PDM subgroups and different obesity indicators are evaluated.Secondly,Obesity is focused on,the correlations between various body components and elevated blood glucose are systematically explored.The characteristics of body components for people with elevated blood glucose are presented.The‘best pointcut’of different body components in preventing elevated blood glucose is recognized.To classify obesity phenotypes,the new idea of dividing‘high risk’obesity population for DM prevention is expounded.Thirdly,Bioinformatics analysis and high-throughput sequencing are used to explore the potential functional single nucleotide polymorphism loci of core genes as genetic markers,which are shared by obesity and DM.Then,DM risk prediction models are constructed by machine learning algorithm,which could provide a powerful theoretical basis and tool for early identification,effective prevention,and control of‘high-risk’population.Methods:1.A longitudinal study was conducted in Jiangsu Province,China,based on the research of‘early identification,early diagnosis techniques and pointcuts of diabetes risk factors’,which was a major chronic noncommunicable disease prevention and control project of the Ministry of Science and Technology.The baseline survey was conducted from April to July 2017,and two follow-up surveys were conducted from July to August 2018 and from July to August 2020,respectively.The generalized estimation equation is used to fit the repeated measurement data,and to evaluate the various influencing factors of PDM and its subtypes.2.Bioelectrical impedance analysis(BIA)is added to measure body composition(lean body mass,fat mass,percent fat,skeletal muscle and visceral fat)in the second follow-up survey.Anthropometric prediction equation(APE)is also used to calculate the predicted lean body mass,predicted fat mass and predicted percent fat.Based on cross-sectional analysis,the consistencies of BIA and APE assessment of lean body mass,fat mass,percent fat are evaluated by intra-group correlation coefficient and Bland-Altman analysis.Logistic regression is used to analyze the correlation between body composition and elevated blood glucose,and restricted cubic splines are used to explore the dose-response relationship between body composition and elevated blood glucose.3.Based on the cohort study,4195 non-DM patients are divided into different obesity phenotypes and received follow-ups.Cochran-Armitage trend test is used to analyze the trend of DM incidence density with different obesity phenotypes.The generalized estimation equation models with a binary distribution using a log link and exchange structure are applied for the pooled analysis sample to explore the effects of different obesity phenotypes on the risk of DM,and to identify the‘high risk’obesity phenotypes of DM.4.The gene expression microarray associated with obesity and type 2 diabetes is downloaded from the gene expression database,separately.Weighted gene co-expression network analysis is used to explore the co-expression clusters and shared genes of obesity and DM.Clue GO is used for gene ontology analysis and the Kyoto Encyclopedia of Genes and Genomes analysis.Cyto Hubba and Centiscape are used to identify the core genes within the shared genes,resulting in identifying six new candidate genes.Then,SNPinfo and Regulome DB are used to screen 17 potential functional single nucleotide polymorphisms(SNPs)of hub genes.A total of 356 newly diagnosed DM patients and 542 healthy controls are obtained through a nested case-control study.SNP detection is performed after genomic DNA extraction.Hardy-Weinberg equilibrium analysis is performed for each SNP by using goodness of fit tests,and the associations between targeted SNPs and DM risk are assessed by multivariate Logistic regression.5.Based on the prospective cohort study,the non-DM population and PDM population are followed up.Two data sets are obtained after collecting baseline information and follow-up outcome.The DM risk prediction model and PDM prognosis prediction model are constructed respectively.For the two datasets,DM prediction models based on machine learning(light gradient boosting machine(Light GBM),extreme gradient boosting(XGBoost),random forest)and deep learning(feedforward neural network,convolutional neural network)and traditional Logistic regression are constructed.The performances of the above models are evaluated from three aspects:differentiation,calibration,and clinical applicability.The model is further optimized according to the importance and accessibility of features to improve the application value.6.Statistical analysis methods:According to the type of variables,mean±standard deviation,median(range of quartiles)or frequency(percentage)are used for statistical description when it is appropriate.χ2 test is used for categorical variables,t test or Wilcoxon test for continuous variables are used for comparison between the two groups.Statistical description,statistical inference and weighted gene co-expression network analysis are all carried out in R software(version 4.1.1).The prediction models are trained and analyzed using Python(version 3.8.0).Test levelα=0.05,P≤0.05 is considered as statistically significant difference.Results:1.A total of 5,713(58.42%)observations are prediabetes(IGT,38.07%;IGT,26.51%;elevated Hb A1c,23.45%);9.66%prediabetes fulfill all the three American Diabetes Association criteria.Among demographic characteristics,higher age is more evident in elevated Hb A1c(OR=2.85,95%CI=2.42-3.35).Female individuals are less likely to have IFG(OR=0.70,95%CI=0.61-0.81)and more likely to suffer from IGT than male individuals(OR=1.41,95%CI=1.20-1.65).Several inconsistency correlations of biochemical characteristics and obesity indicators are detected by prediabetes criteria.Body adiposity estimator exhibits strong association with prediabetes(D10:OR=4.05,95%CI=3.02-5.42).For IFG and elevated Hb A1c,the odds of predicted lean body mass exceed other indicators(D10:OR=3.34,95%CI=1.92-5.81;OR=3.64,95%CI=1.92-6.91).For IGT,predicted percent fat presents the highest odds(D10:OR=6.58,95%CI=4.33-10.00).2.The intra class correlation coefficient of lean body mass and fat mass are 0.84 and 0.82by APE and BIA method,respectively,presenting high consistency.Intra class correlation coefficient of percent fat is 0.67 with medium consistency.Multiple Logistic regression presented that increasements of fat mass,percent fat or BMI are positively correlated with increased blood glucose(P trend test<0.01),but no statistical correlation is found between lean body mass,skeletal muscle,visceral fat and blood glucose(P trend test>0.05)in the male population.The‘best pointcut’for men’s percent fat is 22%.In the female population,increasements of fat mass,percent fat,visceral fat,and BMI are positively correlated with elevated blood glucose(P trend test<0.01),but no statistical correlation is found between lean body mass,skeletal muscle and elevated blood glucose(P trend test>0.05).When the percent fat is higher than 35.01%,it can be considered as the‘dangerous range’of elevated blood glucose.3.3641 participants are included in the cohort study and a total sample of 9623observations is pooled for the longitudinal data analysis.The average follow-up time is 1.64years per person and the overall incidence density of diabetes is 6.94/100 person-years.When compared with those of metabolically healthy normal weight,decreased diabetes risk is found in metabolically healthy overweight phenotype(RR=0.65,95%CI=0.47-0.90)and no significant associations is detected for the metabolically healthy obese(MHO)individuals(RR=0.99,95%CI=0.63-1.53),in contrast to metabolically unhealthy normal weight(MU-NW)(RR=1.81,95%CI=1.28-2.55),metabolically unhealthy overweight(MU-OW)(RR=2.02,95%CI=1.57-2.61)and metabolically unhealthy obesity(MUO)(RR=2.48,95%CI=1.89-3.26)phenotypes.4.Biological information analysis method is used to identify obesity and T2DM related modules and their shared genes systematically.There are 116 obesity and T2DM related modules shared genes,and GO analysis results showed that many of these shared genes are involved in the immune process.KEGG analysis is mainly concentrated in peroxisome pathways.Furthermore,the following six core genes are selected as new candidate genes:ICAM1,CSF1R,TLR8,NOD2,ARG1,MAPK14.5.According to the purpose of study,a total of 17 functional SNPs of hub genes in Chinese Han population are screened.A nested case-control study identifies potential associations between 2 SNPs and the susceptibility of DM.There is a statistically significant association between variant T allele on rs923366 of ICAM1 gene and reduced risk of DM(additive model:OR=0.80,95%CI=0.65-0.99).There is a statistically significant association between the variant T allele of rs41497048 in CSF1R gene and reduced susceptibility to DM(additive model:OR=0.72,95%CI=0.52-0.98).6.Based on a prospective cohort study,a DM risk prediction model and a PDM prognosis prediction model which based on 5 machine learning(deep learning)algorithms and Logistic regression are constructed,respectively.In the two datasets,the AUC of Light GBM was 0.848and 0.804,respectively.Light GBM has the best comprehensive performance in terms of differentiation,calibration degree and applicability.Obesity indicators that measured obesity from multiple aspects are included innovatively.Several obesity indicators,such as Chinese visceral adiposity index,ponderal index,body roundness index,abdominal volume index,etc.,show key roles in different models and are considered as important features of model construction.Light GBM model which optimized based on 10 important features still has good performance in three aspects of differentiation,calibration,and clinical applicability.Conclusions and Recommendations:1.Some correlated factors of PDM under different criteria differ from each other,obesity indicators reveal differences in the strength of association in different PDM subtypes.Obesity indicators are easily measured for target identification.The results can be used for targeted interventions to optimize preventive measures.2.The APE is convenient and easy to obtain,and the predicted lean body mass and fat mass can be used for large-scale epidemiological investigation.The characteristics of body composition for elevated blood glucose,and the relationship between body composition and elevated blood glucose can be helpful to provide a new way for identifying people with elevated blood glucose and self-weight management.3.The MUO phenotype needs to be accorded much more importance.The MU-NW and MU-OW phenotypes are also important component for targeted prevention.People of high risk for DM can be targeted for optimizing preventive strategies to mitigate the obviously increased prevalence of DM.4.Six new candidate genes(ICAM1,CSF1R,TLR8,NOD2,ARG1,MAPK14)shared by obesity and T2DM are identified by biogenic analysis,which can be used as potential biomarkers of DM.The results suggest that the ICAM1 gene rs923366 and the CSF1R gene rs41497048 may alter the susceptibility to DM for the first time.The results provide new insights that ICAM1 and CSF1R genes could be used as targets for prevention and treatment strategies.5.Machine learning algorithm can establish a more suitable and effective DM prediction model.Light GBM has the best comprehensive performance.Ranking the importance of characteristic variables in each model is helpful to identify potential features in DM prevention and control.The importance of several obesity indicators is measured in the corresponding machine learning algorithms,which provided a new idea for the effective identification of obese people in the prevention and control of DM.For Light GBM,the DM risk prediction model is further optimized based on feature importance and accessibility,which improves the practical value of the model.A reasonable prediction for the susceptibility of the population for DM can be helpful for the early diagnosis,and comprehensive prevention of DM,which can be used as scientific basis for the preventive measures.
Keywords/Search Tags:Diabetes, Body composition, Obesity phenotype, Nuclear Tide Polymorphisms, Machine learning
PDF Full Text Request
Related items