Font Size: a A A

Transcriptomics-based Screening Of Early Diagnostic And Prognostic Markers For Non-small Cell Lung Cancer

Posted on:2022-05-28Degree:MasterType:Thesis
Country:ChinaCandidate:G Q LiangFull Text:PDF
GTID:2480306512963269Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Background: Lung cancer is the first malignant tumor in terms of morbidity and mortality,with the characteristics of high malignancy,easy recurrence and metastasis.Lung cancer is divided into non-small cell lung cancer(NSCLC)and small cell lung cancer(SCLC),of which non-small cell lung cancer accounts for 85%.Since there are no obvious symptoms in the early stage of NSCLC,most patients are already in the middle or late stage when they are diagnosed.Patients miss the best time for treatment,and the 5-year survival rate is only 5%.Despite advances in diagnosis and treatment in the last 30 years,biomarkers for early detection,prediction of high recurrence and high mortality populations,and identification of targeted therapies or immunotherapeutic approaches for NSCLC patients are still unsatisfactory.Therefore,identifying effective biomarkers for NSCLC is crucial for the early diagnosis and prognosis of NSCLC patients.Methods:(1)Transcriptome sequencing data and clinical information of NSCLC patients were downloaded from public databases.Data were filtered with standardized analysis such as quality control,comparison,and quantitative analysis to obtain gene expression count files.Subsequently,differential expression analysis was performed using DESeq2 and edge R,and normalization transformation was performed using the vst function of DESeq2 software.Since differences in sequencing technologies across platforms introduce batch effects to the data,sva software was used to remove the batch effects and obtain expression files that could be used for subsequent early diagnostic models as well as prognostic marker screening.(2)The Weighted Gene Co-expression Network Analysis was used to obtain a gene module closely related to NSCLC.Five machine learning diagnostic models were constructed based on the differential genes in this module,and the optimal model parameters were obtained using ten-fold cross-validation.(3)In addition,the obtained differential gene expression data were used to screen prognostic markers closely associated with NSCLC prognosis using Univariable Cox analysis and LASSO analysis.Based on the risk score method,prognostic features were established using prognostic genes.Kaplan-Meier curves with log-rank test and time-dependent receiver operating characteristic(ROC)curves were used to evaluate the prognostic value and performance of the signature,respectively.Independent prognostic analysis was also performed to determine whether this prognostic model could be used as an independent factor to influence prognosis.Results:(1)Four datasets,GSE87340,GSE140343,LUAD(Lung adenocarcinoma),and LUSC(Lung squamous carcinoma),containing sequencing data of 1012 NSCLC patients as well as clinical information,were retrieved from TCGA and GEO databases.4265 differential genes were obtained by differential analysis.(2)After Weighted Gene Co-expression Network Analysis,55 gene modules were obtained,among which the yellow module was closely associated with NSCLC.The joint differential expression analysis results contained 760 differential genes in the yellow module.Five NSCLC early diagnosis models were constructed using the expression matrix of 760 genes combined with ten-fold cross-validation.And it also showed a good predictive ability in the validation group.The diagnostic models,constructed by SVM,Neural Network and boost GLM,had an accuracy of more than 99% in the validation group.Among them,the model constructed by SVM algorithm had the highest specificity,so the SVM algorithm model was selected as the final diagnostic model.(3)A list of immune genes was obtained from the Imm Port database,and a total of 449 differential immune genes were obtained by combining the results of differential analysis.Sixty-three immune genes associated with prognostic survival of NSCLC were obtained using Univariable Cox analysis.Four immune genes associated with prognostic survival were further screened by LASSO analysis,respectively,UCN2,RPPIN,EREG,and BIRC5.A prognostic risk model based on the four immune genes was constructed by Multifactorial Cox analysis.kM survival curves indicated a poorer prognosis in the high-risk group(p<0.01).The risk scores were statistically significant(p<0.01)and were able to independently predict the prognosis of patients as independent prognostic factors by univariate and multifactorial independent prognostic analysis.Conclusion: In this study,the early diagnosis model of NSCLC with an accuracy and specificity of more than 99.9% was constructed based on transcriptome sequencing data.In addition,four immune genes that are closely related to the prognosis of NSCLC were screened:UCN2,RPPIN,EREG,and BIRC5.The prognostic risk model constructed based on the four genes has a good predictive efficacy and can be used as an independent prognostic factor.
Keywords/Search Tags:NSCLC, Early diagnostics, Prognosis, Machine learning
PDF Full Text Request
Related items