Font Size: a A A

Algorithm Development For Cancer Biomarker Detection Based On Multi-omics Data

Posted on:2020-03-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:L T SuFull Text:PDF
GTID:1364330602455534Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Cancers not only bring great pain to patients,but also bring great economic burden to countless families and the entire country.Biomarkers have great value in the early diagnosis of cancer,which is the key to effective treatment of cancer.Biomarkers such as genes,microRNAs,and mutations are indicators that can objectively detect and evaluate the occurrence,development,and prognosis of cancer.Although some cancer-related biomarkers have been discovered,most of them still cannot be used for cancer prediction,diagnosis,and prognosis assessment.One of the reasons is that the pathological mechanism of cancer is extremely complex.Different patients usually have different biomarkers,even though they suffer from the same type of cancer,and the same biomarker may occur in different cancer types.More and more evidences indicate that cancer is a combined effect of gene and microRNA mutations,post-transcriptional modifications,regulatory relationship disorder and environmental factors.Therefore,new biomarker detection methods are urgently needed.Experimental methods are very expensive and time consuming,while computational based methods can greatly reduce costs and shorten detection cycles.With the development of sequencing technology,a large number of omics data such as genomics,transcriptomics,proteomics and metabolomics have emerged,which enable us to systematically understand the pathological cause of cancer,and also provide computational methods with enough datasets for the identification of cancer biomarkers.How to design and realize efficient cancer biomarker detection algorithms by integrating multi-omics datasets that can provide researchers with more valuable cancer biomarkers hypotheses for their targeted research will be the main research content of this thesis.A variety of algorithms have been proposed in the field of cancer biomarker detection.Through in-depth research and systematic analysis of related algorithms in this field,we find that methods in this field can be divided into three categories according to their different research objectives:the first category is single marker-based method,such as gene differential expression analysis.The second category is functional module-based method,such as clustering analysis of gene expression data.The third category focuses on biological networks,such as the identification of new markers by network adjacency and existing biomarkers.Although they have achieved certain recognition accuracy,challenges still exist.Inparticular,genes,microRNAs,etc.function as functional modules.Algorithms of the first category do not consider their importance in functional modules,while methods in the second category rarely consider the dynamic changes of functional modules.Secondly,most methods analyze transcriptome data of genes and microRNAs individually,which is difficult to identify regulatory changes between biomarkers,especially between microRNAs and genes.Studies have shown that changes in the regulatory relationship between genes and microRNAs are related to cancer.Thirdly,most methods are based on single omics data,and their recognition efficiency and accuracy are low,which cannot meet the needs of large-scale data analyses for whole genome and across multiple cancers.In addition,most algorithms do not conduct sample survival analysis for the resulting biomarkers.Considering these challenges,we carried out the following researches:1.Proposed a new gene-gene interaction(GGI)network construction algorithm called LPRP(Linear and Probabilistic Relations Prediction).Based on the algorithm and related omics data,we constructed GGI networks of breast cancer and normal samples.We systematically compared the similarities and differences between the two networks from the perspective of their genes,functional modules and network connections.This research lay a research foundation for our biomarker detection algorithms design based on changes in genes,functional modules and regulatory relationships.2.Proposed a new differential module-based cancer biomarker detection algorithm called MGOGP(Module and Gene Ontology-based Gene Prioritization).Genes function as functional modules(gene→functional module network),so the key biomarker genes often form significantly changed functional modules.MGOGP considers the importance of genes themselves and the importance of their affiliated differencial modules,and introduces GO fuzzy measurement values between genes and known cancer biomarker genes for heuristic search.The algorithm effectively solved the problem that current algorithms ignore the role of genes in functional modules and the dynamic changes of functional modules.3.Proposed a new algorithm called rfnGMI(rectified factor network for cancer-related coding Gene,MicroRNA and their Interactions detection)for identifying cancer related coding genes,microRNAs,and their interactions.rfnGMI uses an efficient biclustering method to identify cancer-specific functional modules,and measures differential expression and differential correlation values of all coding genes and microRNAs in the module.All the genes and microRNAs in the module are prioritized using protein interaction network and known biomarkers,and by considering module importance,and a global rank is obtained using a rank fusion strategy.The algorithm considers the dynamic changes of functional module,and makes up the problem of lacking consideration of regulation changes between genes and microRNAs of current methods.4.Designed and implemented a new algorithm called BISG(BIclustering based Survival related Gene sets detection)based on adaptation of the rectified factor network model.BISG integrates analysis of transcriptome and genomic data,adopts multiple iterations and random sampling strategies,and analyzes the relationship between the statistically significant bicluster genes using a log-rank test and patient survival data.Results showed that the biomarker gene sets identified by the algorithm can significantly distinguish the patient’s survival curves.The algorithm effectively solved the problem of the exponential growth of the search space caused by the explosion of the gene combination.By systematically analyzing twelve different cancer datasets,we found that the survival related biomarker genes were mainly from five gene families:microRNA protein coding host genes,zinc finger C2H2,solute carriers,cluster of differentiation molecules and ankyrin repeat domain containing genes.In addition,we found that these genes are mainly involved in heme metabolism,apoptosis,hypoxia and inflammatory responses.All these results are in consistent with the existing research results,which further validates the effectiveness of our algorithm.
Keywords/Search Tags:Biclustering, functional module, microRNA, survival analysis, gene-microRNA interaction, rectifier factor network model
PDF Full Text Request
Related items