Background and objectivesMalnutrition is a major public health problem affecting human health and also a common pathological state in clinical practice.Among all disease groups,malnutrition is particularly prominent in cancer patients due to the chronic consumptive characteristics of cancer.The frequently-develop malnutrition has multiple negative effects on patients,leading to decreased tolerance to anticancer treatment,decreased efficacy of antitumor therapies,increased postoperative complications,decreased quality of life and physical performance,and shortened survival time.It is estimated that up to 40%of tumor deaths are directly caused by malnutrition.Therefore,malnutrition is one of the key problems in multidisciplinary oncology care that needs to be actively addressed.However,at present,insufficient attention is paid to the nutritional status in cancer patients,and the diagnosis rate of malnutrition is low.One of the major reasons is that there are no unified and standardized criteria for diagnosing malnutrition.The European Society for Clinical Nutrition and Metabolism(ESPEN)has recently proposed the Global Leadership Initiative on Malnutrition(GLIM).The GLIM criteria have integrated the expert consensus of the mainstream international clinical nutrition societies(including clinical nutrition experts from China)and also the current best research evidence,which is expected to unify the diagnosis of malnutrition globally.The GLIM criteria are composed of phenotypic criteria(which contain three subcriteria:low body mass index,weight loss,and reduced muscle mass)and etiological criteria(reduced food intake or assimilation,and inflammation/disease burden).A complete diagnosis consists of three steps:nutritional risk screening,nutritional diagnosis,and severity grading of malnutrition.At the first step,patients are screened for nutritional risk using validated nutritional risk screening tools.Among patients with nutritional risk,at least one phenotypic criterion and one etiological criterion need to be met for diagnosing malnutrition.Further,malnutrition is graded according to the severity of phenotypic criteria,and can be divided into moderate,and severe malnutrition.However,since these criteria were recently proposed,few studies have studied its‘phenotype plus etiology’model for assessing malnutrition in Chinese populations of cancer.Some components of this framework,such as the indicators and corresponding thresholds for assessing reduced muscle mass,also need to be refined.Simultaneously,different phenotypes,different severity of phenotypes,and different etiological criteria have brought a variety of diagnostic combinations,all of which have induced practical difficulties to the clinical diagnosis and severity grading of malnutrition.In addition,malnutrition is a multifactorial and complex disease which requires comprehensive,multi-dimentional assessment.The complex dimensions of malnutrition may include anthropometric measurements,dietary surveys,serum measurements,body composition analysis,metabolic analysis,etc.Therefore,the variety and complexity of the evaluation data also bring difficulties to the nutritional diagnosis and the determination of precise nutritional treatment plans.Machine learning has significant advantages over traditional methods in processing complex and high-dimensional data,which is expected to address the above issues.However,there is currently no malnutrition assessment strategies developed based on big data and machine learning technologies in Chinese populations with cancer.There are currently no malnutrition assessment strategies which are developed based on real-world data of Chinese populations with cancer.To address this key issue,this study sought to explore,develop and validate strategies for assessing malnutrition and related decision-making systems based on the verification and optimization of the existing GLIM framework,including:1.To clarify the validity of the‘phenotype plus etiology’model of GLIM criteria for assessing malnutrition and predicting survival in cancer patients and to provide useful experience and scientific evidences for the development of new strategies for malnutrition assessment.2.To identify the value of machine learning technologies in optimizing the process of diagnosing malnutrition and to provide decision tools for rapid diagnosis of malnutrition in clinical practice.3.To explore potiental drawbacks of the GLIM criteria when they are applied in cancer patients.To analyze whether the potentially lacking indicators of the GLIM criteria can enhance the value of the GLIM for diagnosing malnutrition and predicting clinical outcomes in cancer patients.4.To integrate and analyze the experience during the optimization of the GLIM criteria to explore new strategies for assessing malnutrition based on Chinese population data.5.To develop models that can be used for early identification and grading of malnutrition to provide decision-making evidences to improve the prognosis of oncology populations.Methods1.Participants of the present study were included from the Investigation on Nutrition Status and its Clinical Outcome of Common Cancers project of China(INSCOC,registered at chictr.org.cn,Chi CTR1800020329)which was initiated by the Chinese Society of Nutritional Oncology.The INSCOC cohort has included 18 common malignant tumors in more than 100hospitals(class three,grade one)across the country,and the purpose was to investigate the correlation between nutritional status and clinical outcomes of patients.Our research group has enrolled more than 5000 cancer patients as the leader participating unit of the INSCOC.2.Continuous variables were expressed as mean±standard deviation or median(interquartile range)and categorical variables were expressed numerically(percentages).Statistical approaches such as t-test,chi-squared test,etc.were used to evaluate the relationship between the study variables and nutrition-related indicators,physical performance,quality of life,near-term clinical outcomes and other variables in patients with cancer.The association of malnutrition with cancer survival was evaluated using Kaplan-Meier survival curves and multivariate Cox regression analysis.A nomogram model was generated for survival prediction.Logistic regression was used to evaluate the effect of malnutrition on postoperative complications in patients with cancer underwent surgery.3.The overall data were randomly split then used 75%of the data for model training and the remaining 25%for model evaluation.Use decision tree-based machine learning to optimize and visualize the GLIM diagnostic process in lung cancer and multi-tumor populations.The complexity parameter of the decision tree is determined using cross-validation and further applied to control the size of the decision tree and screen the optimal model.Confusion matrix,Kappa’s consistency index,accuracy and area under the curve were used to comprehensively evaluate model performance.4.Restricted cubic spline analysis and Kaplan-Meier survival curve were used to analyze the relationship between malnutrition or nutrition-related indicators and patient survival.The optimal cut-point value of nutrition-related indicators to predict the survival of tumor patients was calculated using the optimal stratification method.The multivariate Cox regression method was used to analyze whether the study factors were independent predictors of prognosis.Variable screening for multivariate models was based on the least absolute shrinkage and selection operator(LASSO)regression,or a two-way stepwise method based on the Akaike Information Criterion or the Bayesian Information Criterion.5.Unsupervised machine learning algorithm was used to analyze nutrition-related indicators,and to build malnutrition prediction models based on a variety of supervised machine learning algorithms including the linear regression,decision tree,random forest,support vector machine and deep learning algorithms.The model performance and optimal algorithm were analyzed based on the overall accuracy and mean class accuracy.Based on the accuracy,the Kappa’s consistency index,and the multi-class area under the curve,the reserved validation data were used to comprehensively evaluate the model performance.Predictive models were visualized and cross-platform deployable codes were generated to implement the developed machine learning models.Results1.The incidence of malnutrition in lung cancer patients as diagnosed by GLIM was 24%.Compared to those well-nourished patients,the GLIM-defined moderate and severe malnutrition groups had a 1.36-fold(95%CI=1.12-1.63)and 1.47-fold(95%CI=1.05-2.05)increased risk of death,respectively.Test for trend showed a dose-response relationship between GLIM-defined malnutrition severity and mortality risk(P for trend=0.002).The nomogram integrating the GLIM diagnosis had apparently good agreement between predicted survival probability and actual observed survival probability(the Hosmer-Lemeshow test was not statistically significant,P=0.673 and P=0.968 for the overall and validation cohorts,respectively).In addition,the C-index(95%confidence interval)of the GLIM nomogram in the overall and validation cohorts was 0.689(0.659-0.718)and 0.702(0.668-0.735),respectively.In esophageal cancer,the incidence of malnutrition diagnosed by the PG-SGA,ESPEN 2015 criteria,and GLIM was 23.1%,12.2%,and 33.3%,respectively.Compared with the PG-SGA,the consistency and 95%confidence interval of the GLIM in diagnosing malnutrition was 0.803(0.758-0.843)(sensitivity=0.795,specificity=0.805,Kappa=0.519,P<0.001).Compared with the ESPEN 2015,the consistency and 95%confidence interval of the GLIM in diagnosing malnutrition was 0.761(0.714-0.804)(sensitivity=0.886,specificity=0.744,Kappa=0.361,P<0.001).Compared with the PG-SGA,the consistency and 95%confidence interval of the ESPEN 2015 for diagnosing malnutrition was 0.792(0.746,0.833)(sensitivity=0.313,specificity=0.935,Kappa=0.297,P<0.001).Malnutrition defined by the GLIM criteria was an independent predictor of complications after esophagectomy in patients with esophageal cancer(OR=5.00,95%CI=2.79-9.35,P<0.001).Additionally,its predictive power is superior to the ESPEN 2015 criteria and the Patient-Generated Subjective Global Assessment(PG-SGA).2.Based on the results of cross-validation,gender,body mass index,weight loss within six months,weight loss over six months,calf circumference and weight-adjusted handgrip strength were finally used for the construction of a decision tree for diagnosing malnutrition in lung cancer patients.The accuracy rates of the decision tree models developed based on GLIM criteria for lung cancer patients were 0.98(diagnosis tree,Kappa=0.942)and 0.98(classification tree,Kappa=0.955)in the validation data,respectively.The cut-off points of handgrip strength for lung cancer patients based on the optimal stratification method were male<31.2kg and female<22.4kg,respectively.Multivariate Cox regression analysis showed that cancer patients with low handgrip strength had an elevated mortality risk(HR=1.23,95%CI=1.08-1.40).Moreover,these thresholds had higher prognostic value on lung cancer mortality than those noted in the guidelines proposed by the Asian Working Group for Sarcopenia(AWGS).In the multi-cancer population,the incidence of malnutrition diagnosed by GLIM after using the calf circumference method and the calf circumference+grip strength method to assess muscle mass loss was 28%and 26.5%,respectively.The GLIM-defined malnutrition based on the calf circumference method had slightly higher agreement with the PG-SGA(Kappa=0.136)than the calf circumference plus handgrip strength method(Kappa=0.127).Similar to the results observed in lung cancer,the optimal decision tree model in multi-cancer population used five variables for constructing the decision tree,including age,weight loss within six months,body mass index,calf circumference,and the Nutritional Risk Screening 2002(NRS2002)score.The area under the curve of the decision tree in the training data is 0.963(Kappa=0.892,P<0.001,accuracy=0.950),and the area under the curve in the validation data is 0.964(Kappa=0.898,P<0.001,accuracy=0.955).Exploratory subgroup analysis showed that the decision tree model had good performance across different cancer types,with areas under the curve>0.9 in all 14cancer types.Sensitivity analysis showed that the decision tree model had better performance than the NRS2002 alone in predicting malnutrition severity.The descending order of relative importance of the variables in the decision tree was as follows:calf circumference>body mass index>NRS2002>weight loss in six months>age,according to the mean decrease accuracy index.The corresponding ranking was NRS2002>weight loss in six months>body mass index>calf circumference>age according to the mean decrease Gini index.3.Restricted cubic spline analysis showed that the calf circumference and triceps skinfold thickness were positively correlated with the overall survival in cancer patients(P<0.001).No nonlinear associations of calf circumference or triceps skinfold thickness with survival were observed(tests for nonlinearity,P=0.8327 and P=0.8728,respectively).Based on the optimal stratification method,the cut-offs for the calf circumference were 30 cm for women and 32.8 cm for men,and the triceps skinfold thickness was 21.8 mm for women and13.6 mm for men.Multivariate Cox regression analysis showed that low calf circumference(HR=1.13,95%CI=1.03-1.23)and low triceps skinfold thickness(HR=1.22,95%CI=1.12-1.32)were independent risk factors for cancer survival.Low calf circumference and low triceps skinfold thickness showed a potential joint effect(HR=1.39,95%CI=1.25-1.55).In patients with lung cancer,compared to those patients in the well-nourished group as diagnosed by the GLIM,patients in the malnutrition plus low triceps skinfold thickness group had a 54%increased death risk of death with both malnutrition and low triceps skinfold thickness(HR=1.54,95%CI=1.25-1.88).And these patients had a 23%increased death risk(HR=1.23,95%CI=1.06-1.43)compared with those in the malnutrition plus normal triceps skinfold thickness group.GLIM-diagnosed malnutrition combined with low triceps skinfold thickness had higher prognostic value than the GLIM-defined malnutrition alone(HR=1.31,95%CI=1.14-1.50)or low triceps skinfold thickness alone(HR=1.39,95%CI=1.20-1.61).In multi-cancer population,the optimal stratification method indicated that the optimal thresholds for the fat mass index(FMI)were<5kg/m~2 in women and<7.7kg/m~2 in men.After dividing the study population according to the derived thresholds,50%of patients were identified as having a low FMI.In women,FMI was significantly associated with patient age(r=0.074),tumor stage(r=-0.063),NRS2002(r=-0.435),PG-SGA(r=-0.435),Karnofsky Performance Status score(r=0.073)and quality of life(r=0.098).In men,FMI was significantly associated with NRS2002(r=-0.236),PG-SGA(r=-0.236),Karnofsky Performance Status score(r=0.082),and quality of life(r=0.113),but not age or tumor stage.For the cross-categories of FMI and the GLIM,low FMI plus malnutrition(HR=1.93,95%CI=1.48-2.52),low FMI plus well-nourished(HR=1.70,95%CI=1.25-2.32),normal FMI plus malnutrition(HR=1.50,95%CI=1.10-2.04)groups had an increased risk of death compared with the normal FMI plus well-nourished group.Additionally,the independent prognostic value of the fat mass index(C-index=0.585,95%CI=0.563-0.607)was higher than the malnutrition defined by the GLIM(C index=0.555,95%CI=0.533-0.577)model(P=0.029)in cancer.The fat mass index and GLIM-defined malnutrition have potential joint effect on cancer prognosis.4.After integrating the phenotypic and etiological indicators of the GLIM criteria and fat mass indicator,the incidence of malnutrition as defined by cluster analysis was 31.6%in cancer patients.The clustering analysis with larger sample size defined two groups of patients,including 8193 patients(58.0%)in group 1 and 5941 patients(42.0%)in group 2.The nutritional indicators reflected in the heat map showed that the overall nutritional status of patients in group 1 was poorer than those in group 2.The second stage of clustering showed that the optimal number of clusters of the malnutrition group was 3.The prevalence of malnutrition as defined by the GLIM,clustering results,PG-SGA score(≥4),and PG-SGA category(stage B+C)in the overall population was 30.4%,42.0%,52.5%,and 75.5%,respectively.The prevalence of malnutrition as defined by the clustering results was between the GLIM criteria and PG-SGA.The cluster analysis results and GLIM showed the highest agreement among all the methods compared(Kappa=0.561)when defining malnutrition.Other agreement analysis results include clustering versus PG-SGA category(Kappa=0.266),GLIM versus PG-SGA category(Kappa=0.203),clustering versus PG-SGA score(Kappa=0.441)and GLIM and PG-SGA score(Kappa=0.416).For the incidence rates of the different assessment methods,the subgroup analysis showed that the incidence of malnutrition as defined by the four methods in different cancer groups was similar to that observed in the general population(the incidence of malnutrition from low to high was:GLIM<cluster<PG-SGA score<PG-SGA category),except for the patients with nasopharyngeal cancer.In addition,after treating the incidence of malnutrition defined by different methods as a numerical value,Spearman rank correlation analysis was used to further compare the consistency of the incidence of malnutrition as defined by clustering analysis and other methods in different cancer types.The results showed a highly consistent trend of the clustering analysis in identifying the incidence of malnutrition in different tumors(Spearman correlation coefficient of the incidence of malnutrition in 17 cancers:cluster versus GLIM,0.965;cluster versus PG-SGA score,0.922;cluster versus GLIM,0.922;vs PG-SGA category,0.809;P<0.001).For the first stage of clustering,malnutrition clustering results were positively associated with various indicators reflecting impaired nutritional status,including nutritional risk as defined by the NRS2002,malnutrition as defined by the PG-SGA,malnutrition as defined by the GLIM,and use of any nutritional support rate.In addition,clustering analysis-defined malnutrition were also positively correlated with the decline of patients’physical performance status and overall quality of life(P<0.05).For short-term clinical outcomes,malnutrition as defined by the clustering results was positively associated with 30-day mortality,length of hospital stay,and hospital costs.The severity of malnutrition,as defined by the second-stage clustering,was negatively associated with patients’performance status and quality of life.Conversely,the cluster-defined adverse event was positively correlated with 30-day mortality in the study population(P<0.05).Subsequent multiple comparisons showed that both the moderate and severe malnutrition groups had longer hospitalization time and higher hospitalization costs than the mild malnutrition group(P<0.05).A total of 3241 individuals died in the study population within the follow-up period,and the overall median survival time and median follow-up time were 2485 days and1274 days,respectively.Clustering analysis-defined malnutrition was an independent risk factor for cancer survival(with the well-nourished cluster as the reference group,mild:HR=1.20,95%CI=1.08-1.34;moderate:HR=1.63,95%CI=1.50-1.78;severe:HR=1.87,95%CI=1.68-2.08).Subgroup analyses of patients with four major tumors(lung,colorectal,breast,and stomach)and other cancers(the remaining 13 cancers combined)showed similar relationships to those observed in the overall population,suggesting that the malnutrition severity as defined by the clustering analysis was positively associated with mortality risk(P for trend<0.05).For the malnutrition identification model,the multiple linear regression model showed almost perfect performance on the training data(Kappa=1.000 and area under the multiclass curve=1.000).This excellent model performance is maintained in the validation data(Kappa=0.999,area under the multiclass curve=1.000).For the model of early identification of malnutrition,although the input features were reduced(serum indicators were excluded),the multiple linear regression model showed good performance in both training data(Kappa=0.897,area under the multi-classification curve=0.934)and validation data(Kappa=0.905,Area under the multi-classification curve=0.941).Exploratory subgroup analyses in the validation data further showed that the model performed equally well across different tumor types.For the malnutrition identification model,the area under the multiclass curve was>0.998 for all 17 types of cancer.For the early identification model of malnutrition,the area under the multi-classification curve was>0.9 for 16 types of cancer.ConclusionsBased on large-scale cohort data,the present study found that the‘phenotype plus etiology’model of the GLIM framwork is feasible in Chinese oncology populations.The derived cutoffs of muscle parameters may provide important references for future clinical application.This study also improved the workflow of the GLIM-based malnutrition diagnosis using machine learning.In addition,the study emphasized the value of fat mass in the assessment of malnutrition in cancer populations.By integrating the GLIM-related parameters and fat mass indice,this study developed a machine learning-based,fusion decision-making system which has implemented the identification,grading and phenotyping of malnutrition.These results may provide scientific evidence and decision-making tools for nutritional assessment in oncology populations. |