Font Size: a A A

Identification Of Breast Cancer Prognostic Gene Based On TCGA Database

Posted on:2021-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:L S LuoFull Text:PDF
GTID:2370330611995966Subject:Surgery
Abstract/Summary:PDF Full Text Request
Objective: It was designed to find out the potential breast cancer prognosis genes by using bioinformatics methods and software,such as The Weighted Gene Co-expression Network Analysis(WGCNA)and R language to mine and analyze the breast cancer data from TCGA database.This research explored,analyzed and discussed related breast cancer prognosis genes,searched new targets for breast cancer treatment,constructed a new model to evaluate the breast cancer prognostic risk,discussed the possibility and value of this model risk prediction in clinic application,which aims to provide a new direction and method on accurate breast cancer treatment.Method: 1.Collecting the relevant clinical data,pathologic condition and gene sequencing from breast cancer tissue samples and paracancerous tissue samples in TCGA database,all of the differentially expressed genes(DEGs)were selected as the total data in this study and was proceeded the cluster identification and difference analysis.2.GO functional enrichment analysis and KEGG signal pathway analysis were used to explore the functions of differentially expressed genes,including biological processes,molecular functions,cellular constituent and related pathways.3.By using Cytoscape software,Protein-Protein Interaction(PPI)networks for all differentially expressed genes were constructed to identify and visualize the closely related gene clusters.4.WGCNA was used to screen out the related breast cancer prognosis genes and then cluster identification and correlation analysis of those genes were conducted.At the same time,using Cytoscape's Mcode software to analyze and visualize the regulatory relationships of proteins' interactions.Cox regression analysis was used to analyze the correlation of screened genes and find the genes with statistical differences.5.The diagnostic risk model of breast cancer was constructed by R language.The risk group was verified by K-M test and the ROC curve was drawn to verify the reliability of the prognostic risk score.Finally,the correlation between the prognostic risk score and the prognosis of breast cancer was verified by analyzing the relevant factors of the risk score.Result: 1.A total of 1217 differentially expressed genes,including 743 up-regulated genes and 474 down-regulated genes,were screened out from the data of breast cancer gene group which included 112 samples of paracancerous tissue and 1066 samples of breast cancer tissue in TCGA database(log FC > 2 and FDR < 0.05 respectively).2.In GO functional enrichment analysis,differentially expressed genes mainly participated in biological processes such as DNA complex packaging,muscle system process,mitotic nuclear division,xenobiotic stimulus response.And related molecular functions were performed: glycosaminoglycan binding,microtubule cytoskeleton organization involving in mitosis,and located cells in the extracellular matrix.In the analysis of KEGG signal pathway,the differentially expressed genes are mainly related with PPAR signal pathway,c AMP signal pathway,tryptophan metabolism,protein digestion and absorption.3.It found and showed 11 closely related gene clusters in the PPI network.4.We identified 11 modules by the weighted gene co-expression network analysis and screened out the highest correlation modules which included 77 genes.The relationships of those genes in the highest correlation modules were showed.Univariate cox analysis showed that six genes(TRDN,ST8SIA6-AS1,HHIPL2,SAA1,SAA2-SAA4,SAA4)were statistically significant.Two genes were statistically significant in multivariate cox analysis,namely TRDN and ST8SIA6-AS1.5.A clinical predictive risk model which composed of TRDN,ST8SIA6-AS1,SAA2SAA4 and HHIPL2 was constructed by R language.K-M test showed that there was significant statistical difference in survival rate between high risk group and low risk group(P=0.0002775).ROC curve showed that the AUC of 1-year,3-year and 5-year prediction results from this model were 0.664,0.671 and 0.625 respectively,which verified the reliability of the prognostic risk score.Correlation factor analysis verifies that the prognostic risk score can be used in diagnosis and treatment individually.Conclusion: 1.TRDN and ST8SIA6-AS1 can be used as independent prognostic factors or potential targets for breast cancer.2.The prognostic risk score,which constructed by TRDN,SAA2SAA4,ST8SIA6-AS1 and HHIPL2 genes,can be individually used to evaluate the prognostic risk of breast cancer patients,without detecting TNM staging and other clinicopathological features.
Keywords/Search Tags:breast cancer, differentially expressed genes, prognostic risk score
PDF Full Text Request
Related items