| ObjectiveLung cancer has a significantly higher incidence and mortality rate than other malignant tumors around the world,ranking as the leading malignant tumor in incidence rate all year round.With the development of diagnosis and treatment,the survival rate of lung cancer has improved in recent years,but it is still the most deadly cancer.Although lung cancer early screenings using low-dose CT are actively carried out all over the world,the early detection rate is still low.The scale and population coverage of early screening programs may be limited due to economic differences in different regions.As the proportion of lung cancer in the non-smoking population increases year by year,there is a need to expand the scope of screening to meet the changing epidemiological characteristics in order to improve the detection rate of early lung cancer and prolong the survival of patients.Some previous studies have constructed lung cancer risk prediction models based on serum small molecule substances and demographic characteristics and applied them to early screening based on low-dose CT.However,most of the models did not consider lung cancer risk in nonsmokers,and the risk signatures involved lacked extensive validation.In this study,13 metabolites with diagnostic potential(propionyl carnitine,pentenoyl carnitine,26-carbonyl carnitine,hydroxybutyryl carnitine,lauryl carnitine,arginine,asparagine,citrulline,palmitoleic acid,total monounsaturated fatty acids,oleic acid,eicosatrienoic acid,and saturated fatty acids)were identified by analyzing lung cancer risk correlation with serum small molecule metabolites.Independent prediction models for smokers and non-smokers were constructed separately by stratifying patients according to their smoking history.The models were validated for their ability to predict the risk of lung cancer in different smoking groups,which is a guideline for the inclusion of early lung cancer screening population.MethodThis was a retrospective study that included 1723 patients,of which 1109 were pathologically confirmed lung cancer patients and 614 were healthy controls.Demographic information including age,sex,smoking history and clinical data including 65 serum metabolite measurements were collected,and the whole population was divided into 2 groups according to whether they smoked or not.Analysis of differences and correlations based on serum metabolites was performed for each group.Statistical analysis was performed using SPSS 24.0 software and R 4.0.3 software,the data were analyzed using the Shapirovirgue test for normality,Mann-Whitney U test and X2 test for one-way analysis,Lasso regression and logistic stepwise regression for multi-way analysis,with P < 0.05 indicating statistical significance.After modeling,the validity of each model was further assessed using the Receiver Operator Characteristic Curve(ROC)and Area Under The Curve(AUC),using the Hosmer-Lemeshow Test and Unreliability Tset to evaluate the fit of each model.Decision Curve Analysis(DCA)was used to determine the net clinical benefit of each model.Result1.Differential analysis of metabolic profilesA total of 30 metrics showed statistical differences(P < 0.05)in the smoking group cohort(propionyl carnitine,butyldiacyl carnitine,hydroxybutyryl carnitine,pentenoyl carnitine,aoiacyl carnitine,hexanoyl carnitine,hydroxyisovaleryl carnitine,hydroxymyristoyl carnitine,palmitoyl carnitine,hydroxypalmitoyl carnitine,eicosanoid carnitine,24-carbonyl carnitine,26-carbonyl carnitine,alanine,arginine,asparagine,citrulline,serine,valine,palmitoleic acid,oleic acid,eicosatrienoic acid,arachidonic acid,total polyunsaturated fatty acids,total fatty acids,arachidonic acid,total monounsaturated fatty acids,palmitic acid,saturated fatty acids,unsaturated fatty acidsω6).In the non-smoking group,a total of 39 metrics were statistically different(P <0.05)(propionyl carnitine,butyryl carnitine,butyldiacyl carnitine,hydroxybutyryl carnitine,isopentanoyl carnitine,pentanoyl carnitine,hexanoyl carnitine,sunflower carnitine,hydroxyisopentanoyl carnitine,lauryl carnitine,hydroxymyristoyl carnitine,myristoyl diacyl carnitine,palmitoyl carnitine,hydroxypalmitoyl carnitine,hydroxy Eicosanoid carnitine,Eicosanoid carnitine,Docosanoid carnitine,Docosanoid carnitine,Docosanoid carnitine,Asparagine,Glutamic acid methionine,Ornithine,Phenylalanine,Serine,Tyrosine,Valine,Oleic acid,Eicosatrienoic acid,Eicosatrienoic acid,Arachidic acid,Myristic acid,Total polyunsaturated fatty acids,Total fatty acids,Docosatetraenoic acid,monounsaturated fatty acids,palmitic acid,saturated fatty acids,unsaturated fatty acids ω6).2.Relevance analysis of metabolic profileA total of eight variables were significantly associated with the risk of lung cancer in the smoking cohort(propionyl carnitine(OR: 0.672,95% CI: 0.456-0.992,P=0.045),pentenoyl carnitine(OR: 0.981,95% CI: 0.968-0.995,P=0.006),twenty-six-carbonyl carnitine(OR: 0.983,95% CI: 0.964-1.002,P=0.087),arginine(OR: 1.121,95% CI:1.045-1.203,P=0.001),asparagine(OR: 0.982,95% CI: 0.969-0.995,P=0.007),and citrulline(OR: 1.069,95% CI: 1.027-1.114,P=0.001),palmitoleic acid(OR: 1.019,95% CI: 1.012-1.026,P<0.001),total monounsaturated fatty acids(OR: 0.215,95% CI:0.126-0.365,P<0.001)).A total of eight variables were significantly associated with the risk of lung cancer in the non-smoking group(gender(OR: 0.598,95% CI:0.42-0.852,P=0.004),hydroxybutyrylcarnitine(OR: 0.993,95% CI: 0.99-0.997,P<0.001),laurylcarnitine(OR: 1.017,95% CI: 1.01-1.024,P<0.001),cetyl carnitine(OR: 0.989,95% CI: 0.98-0.997,P=0.009),asparagine(OR: 0.976,95% CI:0.968-0.983,P<0.001),oleic acid(OR: 0.619,95% CI: 0.487-0.786,P<0.001),eicosatrienoic acid(OR: 1.01,95% CI: 1.006-1.014,P<0.001),saturated fatty acids(OR: 0.88,95% CI: 0.801-0.968,P=0.008)).3.Construction and validation of clinical risk modelsThe lung cancer risk prediction models were constructed using Logistic regression equations,and the relationships between lung cancer results,each covariate and regression coefficients were transformed into visualized score tables by the Nomogram plotting function(rms::nomogram)in the R language,from which the efficacy of the models was assessed.For the discrimination,the ROC of the smoker model denoted an AUC of 0.860(95% CI: 0.814-0.906)for the training set and 0.850(95% CI:0.774-0.926)for the validation set.The ROC of the non-smoker model denotes an AUC of 0.783(95% CI: 0.753-0.813)for the training set and 0.762(95% CI: 0.710-0.813)for the validation set.For the degree of fitting,the H-L test(training set P-value=0.082,validation set P-value=0.096)and the U-test(training set P-value=0.640,validation set P-value=0.729)of the smokers’ model indicated that they passed the calibration test.The H-L test(training set P-value=0.699,validation set P-value=0.512),and U test(training set P-value=0.973,validation set P-value=0.756)for the non-smoker model both suggested passing the calibration test.For the net clinical benefit,when defining the risk threshold of the smoker model between 0.08 and 0.94,the net benefit rate of the model was higher than that of the no-treatment group and the full treatment group.When defining risk thresholds between 0.01-0.95 for the nonsmoker model,the net benefit of the model was higher than that of the no-treatment group and the full-treatment group.ConclusionsThis study explored the metabolic profile characteristics of non-small cell lung cancer in smoking and non-smoking populations and derived a series of metabolites that were significantly associated with lung carcinogenesis.The model was validated to distinguish the risk of lung cancer in different smoking history groups,and it has some implications for the inclusion of early lung cancer screening population. |