Font Size: a A A

Discovering Cancer Candidate Gene By Integrating Gene-cancer Association,Network Properties,Sequence Features And Functional Annotations

Posted on:2020-11-07Degree:MasterType:Thesis
Country:ChinaCandidate:X Y FanFull Text:PDF
GTID:2404330596496453Subject:Information Science
Abstract/Summary:PDF Full Text Request
Objective: Cancer is a complex disease caused by a variety of genetic changes.Cancer genes play a vital role in the development of cancer,but currently known cancer genes account for only 2% of the human genome,and there are still a large number of Cancer genes have not been discovered so far.Because traditional experimental methods have found that cancer genes are time-consuming and laborious,this study collects the properties of known genes in experiments,PPI networks,sequences and functions,and uses machine learning to predict potential cancer genes.In order to provide a reference for further understanding the mechanisms of cancer and the development of effective cancer treatments.Methods: Screening cancer-related genes from OpenTargets database,and obtaining related variables such as pathways,genetic associations,animal models,and RNA expressions.Simultaneously,download protein interaction data from DIP,HPRD,and BIOGRID databases,and constructing PPI networks and computing networks topological properties using CytoScape;download protein sequence information from Uniprot database,using PROFEAT to calculate protein sequence structure and physicochemical characteristics;obtaining GO terms and KEGG pathway from GO and KEGG databases,integrating the above four characteristics of genes,obtaining known cancer gene from CGC database as outcome variables,using ANOVA and Logistic regression to screen variables,then using a combined sampling method to obtain a balanced data set,using five machine learning methods(RF,GBM,SVM,ANNs,and NaiveBayes)to predict potential cancer genes.And validate the predicted results in the CBioPortal database,the specific analysis was carried out by taking the gene BLK and colon cancer.Results: The model eventually included 62 variables,of which the PPI network topological attribute was of the greatest importance.The distribution of the cancer gene and non-cancer genes in terms of average shortest path length,degree,CNR,amino acid composition,dipeptide composition,and GO and KEGG enrichment are quite different.The distribution is very different.Among the 15 models predicted by the five algorithms,the model PPI+OpenTargets+Sequence+Function performance isthe best among the five algorithms.The average AUC value reached 0.885,and 20 potential cancer genes were predicted,which had a certain degree of mutation and amplification in different cancer tissues.Conclusion: In this study,gene-cancer associations,PPI network properties,sequence features and functional annotations of genes were obtained from authoritative biomedical databases.The machine learning algorithm was used to predict potential cancer genes.The results showed that the prediction results were good,further revealing integrated multiple data sources,using machine learning methods to predict cancer genes is feasibility,can provide reference for the discovery of cancer genes and gene therapy of cancer.
Keywords/Search Tags:PPI network, Ensemble learning, OpenTargets, cancer gene prediction
PDF Full Text Request
Related items