Font Size: a A A

Constructed Model For Diagnosis And Prediction Of Esophageal Squamous Cell Carcinoma Based On Bioinformatic Data Mining Technology

Posted on:2020-08-14Degree:MasterType:Thesis
Country:ChinaCandidate:N J LiuFull Text:PDF
GTID:2404330599454527Subject:Biology
Abstract/Summary:PDF Full Text Request
INTRODUCTION: Esophageal cancer is one of the most malignant tumors worldwide,which can be divided into two pathological subtypes: esophageal squamous cell carcinoma(ESCC)and esophageal adenocarcinoma.The former is mainly distribution in southeast Asia,and also is the main subtype in China.It is very easy to misdiagnose with poor survival rate and high mortality,since there are few clinical symptoms in ESCC patients at the early phase.Latest researches found that many micro RNAs(mi RNAs)can regulate cell proliferation,apoptosis,differentiation,metastasis,invasion and metastasis,and etc.Furthermore,it has been recognized that mi RNAs can also play critical pivotal roles on ESCC development.Numerous studies have revealed that the abnormal expression of tumor-associated mi RNA may be shown earlier than the appearance of clinical symptoms,and thus it may have the potential to become novel biomarkers for disease early diagnosis.The mi RNA expressive profiles of ESCC combined with patients’ clinical data were analyzed in this study to establish a diagnostic prediction model with data mining algorithms(such as,logistic regression,random forest,support vector machine,K-nearest neighbor algorithm,and etc)to establish a diagnostic prediction model for assisting early diagnosis and effective intervention of ESCC.METHODS: Two sets of mi RNA expression profiles with clinical information in ESCC patients from datasets of GEO(216 cases)and TCGA(108 cases)were collected and analyzed.The optimal mi RNA feature subsets were screened by combining differential expression analysis and random forest algorithm.Logical regression,random forest,support vector machine and K-nearest neighbor algorithm were next used to establish appropriate models.And then the classification and prediction effects of the four models were compared based on accuracy,sensitivity and specificity and ROC curve.Bioinformatics analyses,including target gene prediction,construction of mi RNA-m RNA regulatory network,functional enrichment and survival analysis were used to evaluate the biological significance of miRNA sets in the optimal model.RESULTS: A total of 101 mi RNAs were found differentially expressed from above two datasets by comparing mi RNA expressive profiles between ESCC tissues and normal esophageal tissues.Among them,seven mi RNAs(namely,hsa-mi R-93,hsa-mi R-503,hsa-mi R-23 a,hsa-mi R-493,hsa-mi R-375,hsa-mi R-99 a and hsa-mir-195)were selected as candidate features by random forest algorithm.Compared to other models,a best ESCC classifier composed of five mi RNAs(hsa-mi R-93,hsa-mi R-503,hsa-mi R-375,hsa-mi R-195,hsa-mi R-493)that was constructed with logistic regression algorithm combined with L1 regularization strategy had 98.46% and 96.29% accuracy in the two verification sets,respectively.Sensitivity and specificity were higher than 0.95,while two values of the AUC were 0.998 and 0.983.According to this five mi RNAs in a classifier,446 targeted regulatory genes were selected with the information in mi RTarbase database.Integrating gene differential expression analysis,the mi RNA-m RNA regulatory network related to ESCC was further constructed.Functional enrichment analysis revealed that genes in the regulatory network were involved in the regulation of transcription,DNA-templated,positive regulation of cell proliferation,cell division,positive regulation of gene expression,and other cellular expression-related regulatory processes.Moreover,those genes were also taken part in pathways in cancer,Hippo signaling pathway,cell cycle,p53 signaling pathway,Wnt signaling pathway and other cancer-related signaling pathways.Furthermore,expression of this five mi RNAs combined as a panel according to the classifier showed a positive correlation with the survival time of ESCC patients,which may be used as a new biomarker for disease prognosis.CONCLUSIONS: A new classifier with hsa-mi R-93,hsa-mi R-503,hsa-mi R-375,hsa-mi R-195 and hsa-mi R-493 could be served as auxiliary diagnostic and prognostic tools to accurately identify ESCC patients from normal group.In addition,the biological functions of this 5 mi RNAs could also provide new understanding and targets for effective prevention of ESCC.
Keywords/Search Tags:ESCC, data mining, miRNA, bioinformatics, diagnostic prediction model
PDF Full Text Request
Related items