Font Size: a A A

Statistical Learning Based On Thyroid Cancer Staging Characteristic Genes And Prognostic Genes Selection Study

Posted on:2022-11-03Degree:MasterType:Thesis
Country:ChinaCandidate:J Y WangFull Text:PDF
GTID:2504306749450304Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
In recent years,the incidence of malignant tumours has been increasing worldwide and thyroid cancer is no exception.Usually,there are no obvious symptoms in the initial stages of cancer,but by the time it is detected it is usually at a mid to late stage.With the increasing sophistication of human genome testing,it has become very popular to study cancer at the human genome level.The key to studying cancer at the genetic level is to deal with the data problem.Gene expression data are often characterised by high dimensionality and small sample size.A large number of irrelevant genes are interspersed with gene expression data,and researchers need to find valid data from a large number of data.Therefore,the basic idea is to eliminate irrelevant genes by setting certain criteria in advance,and then use a series of methods to select the best characteristic genes to obtain a high classification accuracy.The aim of this study was to select genes that are characteristic and prognostic of thyroid cancer in T,N and M staging.In dissertation,we used the gene expression data,clinical data and survival data of thyroid cancer in the TCGA database.Firstly,we organized the downloaded data,secondly,we reclassified the variables in T,N and M staging according to different rules,then we performed DESeq2 differential analysis to screen out differentially expressed genes in T,N and M staging,then we used LASSO regression,simple random forest and random forest recursive feature elimination algorithm to select the characteristic genes based on the differentially expressed genes.LASSO regression,simple random forest,and random forest recursive feature elimination algorithms were then used to select feature genes,and finally the feature genes screened by the three methods were tested for classification accuracy using a support vector machine respectively.The test results showed that the classification accuracy of the feature genes screened by the random forest feature recursive elimination algorithm was higher than that of the feature genes screened by the simple random forest.In order to further improve the classification accuracy of the screened feature genes,dissertation intersects the feature genes screened by LASSO regression and random forest algorithm,and intersects the feature genes screened by LASSO regression and random forest recursive feature elimination algorithm,and then verifies the classification accuracy by support vector machine.The results show that the intersection of the feature genes obtained by LASSO regression and random forest recursive feature elimination can reduce the number of feature genes on the one hand and improve the classification accuracy on the other hand in T-and N-stage.In dissertation,GO bioprocess enrichment and KEGG pathway enrichment were performed on the selected genes at each stage to investigate the biological processes involved in the complex life activities and the role of the genes in tumorigenesis.In dissertation,the genes associated with survival were screened by Cox univariate regression,based on which a prognostic model was constructed using Cox multi-factor regression,and possible independent prognostic genes were screened,and the 5-year survival ROC curves showed that the constructed prognostic model had moderate to high predictive power.Finally,one possible prognostic gene FCRLB in T-stage and one possible prognostic gene SPSB4 in N-stage were selected using K-M survival curves.Correct identification of prognostic genes is beneficial to the prognostic outcome of cancer patients.Having information about the patient at the genetic level can help doctors to more accurately determine the patient’s prognosis and formulate appropriate treatment plans,which can greatly guide the later treatment of cancer patients and make a significant contribution to improving their survival rate.
Keywords/Search Tags:DESeq2 differential analysis, LASSO, random forest recursive feature elimination, Cox regression, K-M curve
PDF Full Text Request
Related items