| Background:Low-dose computed tomography(LDCT)is commonly used to screen for lung cancer in high-risk patients.When using the original definition of positive screening from the National Lung Cancer Screening Trial,LDCT has a positive screening rate of 27%,and 96%of these are determined to be false positives.Distinguishing small malignant nodules in computed tomography(CT)scan from benign ones is particularly challenging because of their ambiguous radiographic characteristics.This issue is particularly outstanding in the Chinese hospital as it is further complicated by the relatively high prevalence of tuberculomas and pneumoconiosis patients in the Chinese population.Objective:This study aims to investigate the clinical effectiveness of liquid biopsy-based multianalytical approach for lung cancer diagnosis of patients with pulmonary nodules.Methods:(1)This study was designed as a prospective cohort study,and 127 patients with pulmonary nodules were enrolled in two cohorts,98 in the Discovery Cohort and 29 in the Validation Cohort.We collected and analyzed clinical information of patients and performed different biomarker tests including tumor associated proteins,cf DNA mutations and cf DNA methylation for the next step of model construction.Tumor associated protein data was lacked in two patients due to insufficient plasma samples and were excluded in the construction of Multi-omic diagnosis model.(2)Based on clinical information,tumor associated protein,cf DNA mutation and cf DNA methylation,the benign and malignant diagnostic models of pulmonary nodules were constructed respectively.Finally,in order to combine four single models,we constructed a new Multi-omic fusion diagnosis model based on machine learning algorithm.(3)The correlation between the performance of the multi-omics fusion diagnostic model and the nodule size was further analyzed,and its diagnostic efficacy in the <=1cm,1-2cm,2-3cm pulmonary nodule size subgroups was calculated.(4)A cohort of 61 patients with pulmonary nodules undergoing PET-CT examination was retrospectively collected to indirectly compare the advantages of multi-omics fusion diagnosis model in differentiating malignant nodules from tuberculous nodules.Results:(1)99 plasma samples were initially collected and enrolled in discovery cohort.Of them,one sample was excluded due to sample quality control failure,leaving 98 samples including28 for patients with benign pulmonary nodules and 70 malignant pulmonary nodules subject to Multi-omic testing.After discovery study,a separate set of 29 samples(14 benign and 15malignant)were enrolled in independent validation cohort.(2)Patient age consistently showed statistical significance based on our selection criteria,with mean AUC = 0.77 through bootstrapping on the discovery cohort.(3)Based on univariate analysis on the discovery cohort,CEA,CYFRA 21-1,and SCC have shown statistical significance,with predictive AUCs of 0.72,0.68,and 0.67,respectively.Using these three markers,amultivariate predictive model based on support vector machine(SVM)was constructed and tested on the discovery cohort with bootstrapping AUC =0.71.(4)We categorized each sample’s mutations into four different functional levels,based on their match to publicly available hotspots and their functional annotations,and based on an assumption that mutations in the same functional level could be approximated equally in the process of cancer cell evolution.We then represented each category with two numeric features,namely the count of mutations and the maximal variant allele frequency of the mutations,respectively.This process in combination aggregated each sample’s mutation profile into eight numerical features.Modeling based on these features,SVM performed AUC= 0.54 on the discovery cohort.(5)Using data on the discovery cohort,methylated Cp G sites were first clustered into697 methylation-correlated blocks(MCB)for feature representation.43 MCB showed statistical significance on the discovery cohort,of which 30 were selected as multivariate predictors through machine learning.SVM model using the 30 MCB performed AUC= 0.81 on the discovery cohort.(6)Bernoulli Naive Bayesian(BNB)learning model was trained on the discovery cohort,using each individual model’s predictive output as its input,and the sample’s pathological classification as the desired result.The integrative Multi-omic BNB model has since achieved a significantly improved performance of AUC = 0.85 on the discovery cohort.(7)In spite of varying level of performance fluctuation of each individual testing platform’s predictive model,the integrative Multi-omic model has steadily held its performance with AUC = 0.86 on the validation cohort,corresponding to sensitivity = 80%and specificity = 85.7% using prediction score Cut-off of 0.761.(8)Our data did in general supported the correlation between cf DNA quantity and nodule size,with average extracted cf DNA quantity on the discovery cohort changing from551 ng/ml for < = 1 cm,to 613.35 ng/ml for 1-2 cm,and to 625.71 ng/ml for 2-3 cm,respectively;and on the independent validation cohort 858,703.33,and 1015.75 ng/ml,respectively.However,the integrative Multi-omic model’s performance,measured in either AUC,sensitivity,or specificity,despite showing a certain level of fluctuation,did not support such a positive correlation.(9)As a baseline reference,we studied an independent cohort of 61 patients who received PET-CT,including 50 malignant nodules and 11 diagnosis as tuberculosis.Notably,when SUVmax be used for decision making,9 out of 11 tuberculosis samples with SUVmax >2.5 would have been misdiagnosed as malignant,corresponding to AUC=0.65,sensitivity=90%,and specificity=9.1%.In contrast,for the 23 patients in our independent validation cohort who had either malignant nodules or tuberculosis,our integrative Multi-omic model achieved AUC = 0.94,sensitivity = 80%,and specificity=87.5%.Conclusion:Multiomic liquid biopsy techniques can be used to identify specific biomarkers in the blood of patients with pulmonary nodules for benign and malignant identification,including tumor marker proteins,cf DNA mutations,and cf DNA methylation.Combining these liquid biopsy biomarkers with patients’ clinical characteristics,the multi-omic fusion diagnostic model constructed by machine learning algorithm can improve the diagnostic accuracy of lung nodules,and reduce over diagnosis of patients with benign nodules. |