| In recent years,the combination of multi-omics data and machine learning technology has become a hot spot in cancer research,and this technology has become a new operational method for cancer prediction.Clear cell renal cell carcinoma(ccRCC),as the most common subtype of Renal cell carcinoma(RCC),has become a difficult point in the study with the increasing mortality rate,and the effective treatment research is limited.This research applies machine learning methods to the multi-omics data of ccRCC patients and establishes relevant models to help doctors make diagnostic decisions.The main purpose is to find the key genes that have certain pathogenic factors for cancer classification or cancer progression from the molecular level,and then provide a theoretical basis for improving the effect of personalized treatment for patients.The main research content and main contributions of this thesis are as follows:1.Identify key ccRCC-related molecules based on complex networks.The miRNAmRNA network,lncRNA-miRNA network and lncRNA-mRNA network were constructed for differentially expressed mRNAs(DEmRNAs),miRNAs(DEmiRNAs)and lncRNAs(DElncRNAs)of ccRCC by online database or WGCNA algorithm.Based on degree,closeness centrality and betweenness centrality,hsa-mir-155,hsa-mir-200 c,hsa-mir-122,hsamir-506,hsa-mir-216 b,hsa-mir-141,lncRNA AC137723.1 and AC021074.3 are the crucial genes related with the regulatory effects on the proliferation,metastasis and invasion of ccRCC cells.Subsequently,these three monolayer networks were integrated into a lncRNA-miRNAmRNA multilayer network.Considering complex network technology,we found that hsa-mir-122 is screened out as the only crucial gene in three-layer network.Therefore,hsa-mir-122 may play a role in ccRCC.Subsequently,the lncRNA-hsa-mir-122-mRNA network was constructed with hsa-mir-122 as the center.Pathway analysis of the unique target gene GALNT3 linked to hsa-mir-122 showed that GALNT3 influenced the metabolic process of mucin type O-Glycan biosynthesis.LncRNA AC090377.1 is the unique gene that has target genes among lncRNAs with clinical significance that linked to hsa-mir-122 in the lncRNA-hsa-mir-122-mRNA network.Pathway analysis of AC090377.1 suggested that GUCY2 F enriched in phototransduction pathway associated with retina.From monolayer network to three-layer network,hsa-mir-122 is identified as an important molecule in the oncogenesis and progression of ccRCC,offering new strategies to further study of the carcinogenic mechanism of ccRCC.2.Identify key features of ccRCC subtypes.First,we determined ccRCC subtypes based on the expression of mRNA,miRNA and lncRNA,named clear cell type 1(ccluster1)and 2(ccluster2),using consensus clustering,K-Means and EM algorithms.Then,based on ceRNA network,the optimal combination features are selected using random forest and greedy algorithm.In addition,the survival significance of competing gene pairs identified by univariate cox regression analysis can also better identify the two subtypes.These classification features are the competing gene pairs with mi R-106 a,mi R-192,mi R-193 b,mi R-454,mi R-32,mi R-98,mi R-143,mi R-145,mi R-204,mi R-424 and mi R-1271 as the interaction center with prediction accuracy over 92%.Simultaneously,the changes of mi R-106 and OIP5-AS1 affect cell proliferation and the prognosis of ccluster1.The changes of mi R-145 and FAM13A-AS1 in ccluster2 have an effect on cell invasion,apoptosis,migration and metabolism function.Here mi R-192 displays a unique characteristic in both subtypes.Two subtypes also display notable differences in diverse pathways.Tumors belonging to ccluster1 are characterized by Fc gamma R-mediated phagocytosis pathway that affects tissue remodeling and repair,whereas those belonging to ccluster2 are characterized by EGFR tyrosine kinase inhibitor resistance pathway that participates in regulation of cell homeostasis.In conclusion,identifying these gene pairs can shed light on therapeutic mechanisms of ccRCC subtypes.3.Identify key genes based on complex networks and machine learning.First,a robust and sparse correlation matrix estimator was used to construct a gene-gene interaction network with the correlation coefficient greater than 0.6,and then combined with Page Rank and consensus clustering to select the top-ranked genes,and it was found that the top 31 genes with the PR value have the best results in distinguishing subtypes with the silhouette coefficient is 0.913.GO analysis of target genes found that ATXN1 L,STAT5B,KMT2 A,KAT7 and ASH1 L enriched the largest number of GO,and these genes have a certain effect on cell proliferation.At the same time,the survival analysis of the stage and grade of the two major subtypes is also very significant,providing some theoretical basis for prognostic research.This article is dedicated to using machine learning methods to perform a series of biological model predictions and data mining on multi-omics data.Combined with the knowledge of complex networks,we use the knowledge of computational methodology to explore the correlation between data to help a deeper understanding of cancer. |