Lung cancer is the most common malignancy in humans and the leading cause of cancer-related death worldwide.Non-Small Cell Lung Cancer(NSCLC)is the main subtype of lung cancer,accounting for about 85%.NSCLC has a high degree of intratumoral heterogeneity.Its clinical manifestations,drug efficacy,differences in treatment sensitivity and drug resistance are all closely related to tumor heterogeneity.Lung Adenocarcinoma(LUAD)and Lung Squamous cell Carcinoma(LUSC)are the two most common subtypes of NSCLC.Immunotherapy can significantly improve survival and quality of life in some patients,and screening the target population for immunotherapy based on molecular typing has important clinical significance.With the rapid development of singlecell sequencing technology and the strengthening of the application of immunotherapy in lung cancer,this study integrated single-cell sequencing and bulk sequencing data resources.Based on screened immune-related genes,we explored the molecular characteristics of two lung cancer subtypes and molecular types,and also used a variety of machine learning algorithms to screen the best immunerelated prognostic models.This study aimed to systematically explore the molecular characteristics and subtype differences of LUAD and LUSC,and provided data reference for NSCLC-related research.Simultaneously,based on these results,a lung cancer-related database was constructed to provide online analysis tools and method references for similar research.The main contents of this study were summarized as follows:1.In order to obtain differentially expressed immune-related genes from the single-cell level,the single-cell RNA sequencing(scRNA-seq)resources were integrated and analyzed,and 158 and 135 differentially expressed immune-related genes were obtained in LUAD and LUSC,respectively.Thirty-five of these genes were shared by both lung cancer tissue subtypes.The single-cell atlases indicated that the proportion of T cells was the highest in LUAD,and the proportion of epithelial cells was the smallest,while the proportion of epithelial cells was the highest in LUSC.2.To obtain immune-related genes from transcriptome data,based on the transcriptome data of The Cancer Genome Atlas(TCGA),two immune-related subtypes of LUAD and LUSC were identified and compared using single sample Gene Set Enrichment Analysis and consensus clustering analysis based on immune-related genes.It was found that the associations between the level of immune infiltration and patient prognosis were different in the two cancers,and a high level of immune infiltration was more favorable for the survival of LUAD patients.Chemokines CXCL9 and CXCL11 were significantly highly expressed in the c1 subtype of LUAD,and they may affect the prognosis and survival of patients by affecting the abundance of immune cell infiltration.Most immune checkpoint genes were significantly overexpressed in the highly immune-infiltrating subtypes of both cancers.The correlations of TMB and immune-related features were different between LUAD and LUSC.Female patients showed stronger TMB-immune feature correlations based on sex-grouped analysis in LUAD,but opposite results were detected in LUSC.TP53,TTN,CSMD3 and MUC16 genes were the top genes with higher mutation frequencies in the two cancers,but the mutation frequencies were quite different.For example,the mutation frequency of TP53 in the c1 subtype of LUSC was 29%higher than the c1 subtype of LUAD.The results of drug sensitivity showed that the high immune infiltration subtype in LUAD was more sensitive to drugs,while there was no significant difference between the immune subtypes in LUSC.Further analysis based on gender factors showed that female LUAD patients with high immune infiltration were more sensitive to drugs,while male patients with high immune infiltration were more sensitive to drugs in LUSC.These results provide insights into the differences in the pathogenesis of the two lung cancers,and facilitate the understanding of the biological mechanisms of these two cancers,the application of biomarkers,and the improvement of treatment methods.3.According to the integrated the differentially expressed immune-related genes obtained from the above two parts,the prognosis-related immune genes were obtained using single-factor Cox analysis.Based on 11 machine learning algorithms,the best immune-related prognostic models were screened,and Random Survival Forest was determined to be the best model for LUAD,and Generalized Boosted Regression Model was the best model for LUSC.The accuracies of the models were then verified with independent validation sets,respectively.Combined analysis of prognostic models and clinical staging could further improve model performance.The performance of the LUAD prognostic model was improved more significantly,especially when combined with Stage analysis,and the average C-index increased from 0.7148 to 0.7545.4.According to the above results,we built a Lung Cancer Database(LungCancerDB,http://tmliang.cn/lung),which integrated multi-omics data of lung cancer and integrated interactive online analysis tools.The analysis module in LungCancerDB mainly included tools such as survival analysis,ssGSEA-based immune infiltration analysis,consistency clustering analysis,and gender difference correlation analysis,aiming to provide researchers with a powerful interactive analysis platform.Taken together,this study integrated scRNA-seq data and TCGA transcriptome data to comprehensively explore the differences in the molecular characteristics and immune-related molecular subtypes of LUAD and LUSC,and selected the best prognostic models using multiple machine learning algorithms.The accuracies were further verified,and the prognostic values of the models were discussed based on clinical factors.Finally,based on these results,the LungCancerDB was developed to provide a data reference and analysis platform for lung cancer-related research.These relevant results may contribute to deepening the understanding of the molecular characteristics of the two lung cancer subtypes,and providing data support and theoretical reference for the prognosis prediction,clinical decision-making,biomarker discovery and personalized treatment of NSCLC patients. |