Font Size: a A A

Research On Identification Methods Of Cancer-related Driver Genes

Posted on:2022-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:H GuoFull Text:PDF
GTID:2504306731987999Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the continuous in-depth research on cancer,more and more people have realized the importance of mutations in the process of cancer evolution,and the driver mutations which play a decisive role in the direction and degree of cancer evolution have come into the field of vision of researchers.Driver mutations give tumor cells a selective advantage that allows them to evade the body’s immune monitoring mechanisms,allowing them to divide in large numbers and gradually threaten the body’s tissues and organs.With the in-depth research on this driver mechanism,people have gradually found the tendency distribution of mutations in cancer cells at the gene level,showing the phenomenon that a large number of mutations cluster on cancer genes,which is called mutation cluster.At the same time,due to the existence of tumor heterogeneity,previous studies on driver genes were often affected by the differences between different individuals and different tumor cells within the same individual,leading to a large number of false positives in the identification of driver genes.However,if the study of drivers is transferred from the gene level to the gene set(pathway)and mutation cluster level,the influence of tumor heterogeneity can be overcome to some extent.Therefore,driver gene identification methods based on mutation clusters and pathway levels are proposed in this paper.The main work as follow:(1)A cancer gene identification method based on mutation cluster(HEA)was proposed.Existing methods from the mutation cluster level to identify drive gene contain relevant parameters.These parameters need relevant experts experience,resulting in low robustness of the algorithm.At the same time,the existing algorithms tend to identify short mutation clusters without recognizing the important role of long mutation clusters in the development of cancer.The HEA method uses the dynamic iteration method to identify the mutation cluster.HEA not only considers the local location information of mutation,but also utilizes the global mutation information when identifying the mutation cluster.HEA is a long length biased,highly enriched,robust and efficient computational method.A total of 1846 mutation clusters were identified by HEA.Compared with other cluster recognition methods(M2C,Oncodriveclust and PFAM domains),which experimented on 571 genes in 23 cancer types from The Cancer Genome Atlas(TCGA).In the cancer enrichment analysis,the45% of HEA cancers P values were concentrated in the range of 0~0.01,and none were in the range of 0.05~1.In HEA robustness analysis,Spearman correlation coefficient was 0.9,and the P value of robustness was close to 0.The recognition accuracy of HEA,M2 C and Oncodrive CLUST were 88%,77% and 56%,respectively.At the same time,the time and space cost analysis of the method shows that the method is more effective,and it is more suitable for the analysis of pan-cancer data sets.(2)An approach based on mouth brooding fish algorithm to identify driver gene at a pathway level was proposed.Many methods based on the Maximum Weight Submatrix model to identify driver pathway attach equal importance to coverage and exclusivity and give equal weights,but these methods ignore the impact of mutational heterogeneity.In this dissertation,we use principal component analysis(PCA)to incorporate covariate data to reduce the complexity of the algorithm,and construct a Maximum Weight Submatrix model considering different weights of coverage and exclusivity.The impact of mutational heterogeneity was overcome to some extent.The data of lung adenocarcinoma and polyformative glioma were tested with this method,and compared with MDPFinder,Dendrix and Mutex method.When the driver pathway size is 10,the recognition accuracy of the method reaches 80% in both data sets,and the weight value of the submatrix is 1.7 and 1.89,respectively,which are better than that of the contrast methods.At the same time,in the signal pathway enrichment analysis,the important role of the driver genes identified by our method in the cancer signaling pathway was revealed,and the validity of the driver genes identified by our method was demonstrated from the perspective of biological effects.
Keywords/Search Tags:mutation cluster, driver pathway, driver gene, mouth brooding fish algorithm
PDF Full Text Request
Related items