Background:Latest global cancer data demonstrate that lung cancer is the leading cause of cancer related deaths among all malignancies.Primary lung cancer is divided into nonsmall cell lung cancer(NSCLC)and small cell lung cancer.For NSCLC,traditional therapies including radiotherapy and chemotherapy have relatively poor therapeutic effects.However,targeted therapy based on Epidermal Growth Factor Receptor(EGFR)Tyrosine Kinase Inhibitor(TKI)can significantly improve progression free survival(PFS)of patients.The prerequisite for EGFR-TKI targeted therapy is to obtain EGFR mutation status through tissue or liquid biopsy,but the examination is mostly invasive,high cost,and long detection cycle.Due to the heterogeneity of tumors,small biopsy specimens will also lack representativeness,resulting in false negatives.Therefore,non-invasive prediction of EGFR mutation status has potential clinical value.On the other hand,in the process of TKI treatment,the efficacy of different patients is heterogeneous,and some of them will quickly develop resistance.Thus,noninvasive prediction of TKI efficacy also has an important guiding role in the management of patients with advanced NSCLC.Radiomics and deep learning have risen rapidly in recent years.Previous studies of our research group and other scholars have indicated that Radiomics models can predict EGFR mutations and TKI treatment of PFS for advanced NSCLC.However,most studies use traditional radiomics to predict EGFR mutations and TKI efficacy.Only few studies use deep learning technology with small sample size,which affects the generalization of prediction models.In addition,studies have shown that peripheral imaging features outside tumors in the lung are also important predictors of EGFR mutations.However,no deep learning studies have considered these factors.Therefore,our study intends to study characteristics of EGFR mutations based on a large sample population,apply the whole lung deep learning technology for the first time to predict EGFR mutations and innovatively establish a deep learning model at multiple timepoints to predict TKI efficacy based on pretreatment and posttreatment chest CT scans,to guide the clinical precision targeted therapy of lung cancer.Objective:1.To enroll patients initially diagnosed as NSCLC by pathology with EGFR detected,and analyze characteristics of EGFR mutations for NSCLC patients.2.To establish non-invasive prediction models for EGFR mutations in NSCLC patients.3.To establish non-invasive prediction models for TKI efficacy in advanced NSCLC patients with EGFR sensitive mutations.Materials and Methods:1.A total of 7055 patients who were initially diagnosed as NSCLC by pathology with EGFR detected at West China Hospital of Sichuan University from 2009 to 2018 were included.We collected patients’ EGFR mutation status and clinicopathological characteristics,including age,gender,smoking history,smoking amount,family history of cancer,tumor location,histological subtype and tumor stage.Then,we analyzed characteristics of EGFR mutation for NSCLC patients.2.Establish non-invasive prediction models for EGFR mutation.The IQQA software was used to extract DICOM files of patient’s chest CT from the PACS system in our hospital within 1 month before surgery or biopsy.All patients were randomly divided into training set(N=5645),validation set(N=705)and testing set(N=705)at a proportion of 8:1:1.Firstly,we established a clinical prediction model on the training set,then established a tumor-based deep learning model and a whole lung deep learning model by using chest CT,and adjusted parameters of deep learning models on the validation set,and tested effects of models on the testing set.Subsequently,we screened clinical indicators based on the optimal deep learning model and established a deep learning-clinical combined model to further optimize the performance of our prediction model.Then,patients who were initially diagnosed with NSCLC by pathology with EGFR detected from January 2019 to September 2019 in our hospital were prospectively enrolled for prospective validation(N=891).Finally,The Cancer Imaging Archive(TCIA)data from international public database were used for external validation(N=154).We evaluated models by using Area under the ROC curve(AUC),sensitivity,specificity and accuracy.3.Establish non-invasive prediction models for TKI efficacy.Patients with sensitive EGFR mutation and regular follow-up at our hospital for at least 1 year after initiation of TKI therapy(N=920)were included,and clinicopathological,laboratory indicators,and chest CT scans were collected to predict 12-months progression-free survival(PFS).Firstly,we established and tested a clinical prediction model in all included patients.Secondly,for patients who had chest CT taken within 1 month before the initiation of targeted therapy(N=603),radiomics and deep learning models at a single timepoint were established and tested.Among them,we screened patients(N=489)who took chest CT within 6-8 weeks after the initiation of targeted therapy,and established and tested a deep learning model at multiple timepoints based on pretreatment and post-treatment chest CT scans.Subsequently,we screened clinical indicators based on the optimal deep learning model,and a deep learning-clinical combined model is established to further optimize the performance of our prediction model.All cohorts were divided into training set and testing set at a proportion of 4:1.We evaluated models by using AUC,sensitivity,specificity and accuracy.Finally,according to scores of prediction models,patients were divided into a high-risk progression group and a low-risk progression group,and then compared differences in PFS between two patients group.4.Statistical analysis methods.The rank sum test was used to compare differences between groups of continuous variables,the chi-square test was used to compare differences between groups of categorical variables,and the log-rank test was used to evaluate the statistical differences of PFS between groups.For the evaluation of models,we mainly calculated AUC of ROC curve,and other indicators such as sensitivity,specificity and accuracy.Two-sided P<0.05 was considered to be statistically different.SPSS 26.0,R software and Python 3.7 was used for statistical analysis,and the implementation of deep learning models depended on the Keras toolkit(version 2.4.3)with a tensorflow backend.Results:1.Characteristics of EGFR mutations in NSCLC patientsA total of 13,565 lung cancer patients who underwent surgery or biopsy in West China Hospital and detected EGFR mutations were screened.According to inclusion and exclusion criteria,finally 7055 patients were included for our study.The average age of all enrolled patients was 58.5±10.92 years old.Of the patients,53.9% were male,39.1% had smoking history with average smoking amount 13.5±21.25 pack-years,16.1% had family history of cancer.Histopathological subtype was mainly adenocarcinoma,accounting for 84.7%,the remaining 10.2% were squamous cell carcinoma,and 5.0% were other types of NSCLC.Among them,3574 patients were EGFR wild-type,3481 patients harbored EGFR mutations and EGFR mutation rate was 49.3%.Compared with EGFR wild-type patients,the proportion of women was higher among EGFR-mutant patients(32.7% vs 59.8%,P<0.001),and most were never-smokers(40.2% vs 68.9%,P<0.001)and more patients had family history of cancer(14.2% vs 18.1%,P<0.001).The proportion of adenocarcinoma among EGFR wild-type patients(74.3%)was significantly lower than that of EGFR-mutant patients(95.4%),while the proportion of squamous cell carcinoma among EGFR wild-type patients(18.5%)was significantly higher than that of EGFR-mutant patients(1.8%,P<0.001).Among EGFR-mutant patients,96.0% were sensitive mutations,while nonsensitive mutations accounted for only 4.0%.EGFR sensitive mutations include common sensitive mutations(89.6%),uncommon or rare sensitive mutations(5.9%)and other sensitive mutations(0.5%).Among patients with common sensitive mutations,47.2% were L858 R mutations and 42.0% were 19-Del.Among patients with uncommon or rare sensitive mutations,2.2% were G719 X mutations,2.0% were L861 Q mutations,and 0.4% were S768 I mutations.Among patients with non-sensitive EGFR mutations,2.5% were 20-Ins mutations,1.4% were mutation types containing T790 M.The proportion of smokers in patients with uncommon sensitive mutations(32.2%)was significantly higher than that of patients with common sensitive mutations(23.9%)and non-sensitive mutations(23.9%,P=0.005).In addition,the proportion of patients with family history of cancer in patients with common sensitive mutations(18.8%)was significantly higher than that of uncommon sensitive mutations(11.7%)and non-sensitive mutations(12.3%,P=0.006).There were no significant differences in gender,age,tumor location,histological subtype,and tumor stage among the three groups.2.Non-invasive models to predict EGFR mutations status in NSCLC patientsThree cohorts were used to establish and test non-invasive EGFR mutation prediction models,including a retrospective cohort containing 7055 patients(randomly divided into training set,validation set and testing set at 8:1:1),a prospective cohort containing 891 patients and a international cohort containing 154 patients from TCIA dataset.2.1 Clinicopathological characteristics of patientsIn the prospective cohort,the average age of patients was 59.02±11.23 years and52.0% were male.22.8% of patients were ever or current smokers,and 13.0% had family history of cancer.Most were adenocarcinoma(84.8%),followed by squamous cell carcinoma(10.9%).There were 49.3% patients harboring EGFR mutation.In the TCIA cohort,the average age of patients was 67.86±10.02 years and 66.2% were men.74.7% of patients had smoking history.Of the patients,the proportion of adenocarcinoma and squamous cell carcinoma was 87.7% and 10.3%,respectively.In this cohort,25.3% patients showed EGFR mutation.Characteristics of the patients in retrospective cohort were detailed in Result 1.Compared with the training set,there was no significant difference in the distribution of clinicopathological characteristics of the patients in validation set and testing set.2.2 Clinical prediction modelUnivariate and multivariate logistic regression analysis indicated that gender,smoking history,smoking amount,family history of cancer and histological subtype were independent predictors of EGFR mutation status.We incorporated the above variables into the logistic regression model,and the model formula was showed as follows: Y =-1.939 + 0.210*women + 0.232*never-smokers-0.021*smoking amount+ 0.322* tumor family history + 2.099* adenocarcinoma(or 0.890* other nonsquamous NSCLC).Then the clinical model predicted EGFR mutation status,with the AUC of 0.711,sensitivity of 0.800,specificity of 0.552 in the training set,with the AUC of 0.741,sensitivity of 0.734,and specificity of 0.647 in the validation set,with the AUC of 0.727,sensitivity of 0.760,and specificity of 0.609 in the testing set;and with the AUC of 0.714,sensitivity of 0.882,and specificity of 0.471 in the prospective cohort.2.3 Tumor-based deep learning prediction modelTumor-based deep learning model had the AUC of 0.863,sensitivity of 0.838,and specificity of 0.738 in the training set,the AUC of 0.700,sensitivity of 0.733,and specificity of 0.574 in the validation set,the AUC of 0.732,sensitivity of 0.725,and specificity of 0.641 in the testing set,and the AUC of 0.645,sensitivity of 0.923,and specificity of 0.263 in the TCIA cohort.2.4 Whole-lung deep learning prediction modelIn the training set,the AUC for the whole-lung deep learning prediction model was 0.857,the sensitivity was 0.819,and the specificity was 0.733.In the validation set,its AUC was 0.779,the sensitivity was 0.746,and the specificity was 0.650.In the testing set,its AUC was 0.759,and the sensitivity was 0.687,the specificity was 0.676.In prospective cohort,its AUC was 0.756,sensitivity was 0.670,and specificity was0.723.In TCIA cohort,its AUC was 0.755,sensitivity was 0.590,and specificity was0.826.2.5 A prediction model combined whole-lung deep learning and clinical featuresThe performance of the combined model was further optimized when incorporating characteristics of whole-lung deep learning and clinical features including gender,smoking amount,histological subtypes.The AUC of the combined model in the training set,validation set,testing set,prospective cohort,and TCIA cohort were 0.873,0.808,0.804,0.785 and 0.802,respectively.3.Non-invasive prediction models of TKI efficacy in advanced NSCLC patients with EGFR sensitive mutationsWe included 920 advanced NSCLC patients with EGFR-sensitive mutations to establish TKI efficacy prediction models and assessed the risk of disease progression within 12 months.3.1 Clinicopathological characteristics of patientsThe average age of patients was 58.07±10.72 years.The histological type was mainly adenocarcinoma,accounting for 96.6%,followed by squamous cell carcinoma(2.4%),and other histological subtypes only accounted for 1.0%.And the proportions of patients with stage ⅢB,ⅢC,ⅣA and ⅣB were 2.5%,1.6%,54.1% and 41.7%,respectively.Of the EGFR mutations for all patients,97.5% were common sensitive mutations,2.5% were rare sensitive mutations.In terms of TKIs,68.3% were gefitinib,and 31.7% were icotinib.The median PFS for all patients was 12.1 months(95% CI:11.1-12.7).3.2 Clinical prediction modelUnivariate and multivariate COX survival analysis indicated that several clinical variables were independent predictors of patients PFS,including histological subtypes,brain metastasis,pleural metastasis,platelet-to-lymphocyte ration(PLR),lymphocyteto-monocyte ratio(LMR),albumin-to-alkaline phosphatase ratio(AAPR).Then we incorporated the above variables to establish a clinical model to predict whether patients receiving TKI treatment would progress within 12 months.In the training set and testing set,AUC of the prediction model was 0.63 and 0.59,respectively.3.3 Radiomics modelWe included 603 patients who had chest CT within one month before targeted therapy,then tumor contours were accurately delineated on their CT scans,and we extracted 1218 radiomics features.Through univariate Cox analysis and lasso regression,we screened out four features to establish radiomics model.In the training set,the AUC of the radiomics model was 0.615,the sensitivity was 0.562,and the specificity was 0.594.In the testing set,the AUC was 0.614,the sensitivity was 0.623,and the specificity was 0.550.Patients were divided into two groups according to the model scores.The median PFS of the high-risk group in the testing set was 10.4 months(95% CI: 9.3-11.7),while the median PFS of the low-risk group was 13.9 months(95%CI: 12.3-15.3,P<0.05).3.4 Deep learning model at a single timepointWe extracted 128-dimensional features from tumor contours of chest CT images before TKI treatment,and established a prediction model at a single timepoint using deep learning methods.In the training set,AUC was 0.697,sensitivity was 0.356,specificity was 0.873.In the testing set,AUC was 0.680,sensitivity was 0.292,specificity was 0.833.According to the model scores,the median PFS was 10.0 months(95% CI: 9.2-10.8)in the high-risk group and 16.4 months(95% CI: 14.6-17.8,P <0.05)in the low-risk group.3.5 Deep learning model at multiple timepointsUsing two CT scans before and after treatment,128-dimensional features were extracted from tumor contours,and a prediction model at multiple timepoints was established through deep learning.In the training set,the AUC of the model was 0.825,the sensitivity was 0.543,and the specificity was 0.888.In the testing set,the AUC was 0.728,the sensitivity was 0.500,and the specificity was 0.812.Patients were divided into two groups according to the model scores.The median PFS of the highrisk group in the testing set was 10.2 months(95% CI: 9.4-11.0),and the median PFS of the low-risk group was 16.0 months(95% CI: 14.0-18.1,P<0.05).3.6 Combined prediction modelLASSO analysis suggested that age,smoking amount,tumor stage,brain metastasis,contralateral lung metastasis,neutrophil-to-lymphocyte ratio(NLR),PLR and LMR were highly correlated with progression free survival for patients enrolled in the deep learning model at multiple timepoints.Combined the above variables with characteristics of deep learning at multiple timepoints,the performance for combined model can be further optimized.In the training set,the combined model had AUC of0.896,sensitivity of 0.718,and specificity of 0.902.In the testing set,AUC was 0.734,sensitivity was 0.604,and specificity was 0.750.The median PFS of the high-risk group in the testing set was 9.9 months(95% CI: 9.0-10.6),and the median PFS of the low-risk group was 16.0months(95% CI: 14.2-18.0,P<0.05).Conclusion:1.The mutation rate of EGFR in NSCLC patients was 49.3%.Among them,96.0%were sensitive mutations,and non-sensitive mutations accounted for only 4.0%.Among the common sensitive mutations,47.2% were L858 R and 42.0% were 19-Del.EGFR mutations are more commonly seen in women,never-smokers,and lung adenocarcinoma patients with family history of cancer.2.Based on deep learning method,we constructed a whole-lung deep learning model to non-invasively predict EGFR mutations,which was superior to tumor-based deep learning models and clinical models.When combined with clinical features,the prediction performance of the whole-lung deep learning model was further optimized.3.Using chest CT scans before and after treatment,we established a deep learning model at multiple timepoints to predict TKI efficacy in advanced NSCLC patients with EGFR-sensitive mutations.Based on this model,patients receiving the first generation of TKIs can be divided into the high-risk group and the low-risk group in terms of disease progression,so as to accurately guide the clinical medication of patients with advanced NSCLC. |