Font Size: a A A

Research On Methods Of Identifying Disease Genes On Biomolecular Networks

Posted on:2022-01-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:H X ShangFull Text:PDF
GTID:1480306311467054Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
The pathogenesis of major diseases is very complex.From the genetic point of view,complex diseases result from the interactions between genes-genes,genes-environment.Thus,identifying disease genes becomes a key problem in complex diseases research.With high throughput technology develops,multi-omics data are available,which reflect molecular changes of complex diseases from different levels,are helpful for revealing the pathogenesis of complex diseases.Therefore,it is important to develop effective bioinformatics methods to analyze multi-omics data for identifying disease genes.Most existing research methods are based on linkage analysis and genome wide association study,which don't effectively deter-mine concrete disease genes and have disadvantages of high cost and more false positives.Molecules in cells usually play specific biological functions in the form of in-teraction networks.Therefore,it's necessary to identify disease genes from the perspective of molecular network,and the most widely used algorithm for iden-tifying disease genes in molecular networks is random walk algorithm,especial-ly PageRank algorithm.Although the algorithm has made much achievements in identifying disease genes,there is still room for making improvements in in-tegrating multi-omics data and corresponding multi-layer networks as well as the prior knowledge of genetic information.Thus,based on the random walk algorithm,considering the available multi-omics data and corresponding multi-molecular network as well as prior knowledge of genetic information,this paper systematically studies identifying disease genes,disease modules from molecular networks under different application backgrounds and proposes bilayer Rank al-gorithm for bilayer heterogeneous molecular network,integrative Rank algorithm for multi-layer biomolecular network,tensor Rank algorithm for high-dimensional molecular network,phenotype-driven module detection and ranking algorithm for identifying modules in network,which provides a feasible research idea for inte-grating multi-omics data and network data to identify disease genes and network modules.The research contents are summarized as follows:(1)Nowadays,many researches on identifying disease genes are based on PageRank algorithm in single biomolecular network,while relatively few research-es focus on identifying disease genes on bilayer biomolecular network.This paper proposes a random walk algorithm for identifying disease genes from bilayer het-1 erogeneous network,called bilayer Rank algorithm,which combines two kinds of disease omics data and the corresponding two-layer molecular network by weigh-ing,also adds the prior knowledge of genetic information,to create a specific two-layer heterogeneous molecular network.Then,run bilayer Rank algorithm to get the bilayer Rank value,further to measure the importance of the nodes.This algorithm is used to identify disease genes of type ? diabetes mellitus.The results show that bilayer Rank algorithm can effectively identify disease genes,which provides a reference for integrating two kinds of omics data and corre-sponding bilayer network to identify disease genes.(2)Combining the multi-omics data with the corresponding multi-layer net-work data effectively for identifying disease genes becomes a research hot top-ic.Most existing methods are based on integrating multiple network structures or some kinds of omics data and network.This paper proposes an integrated constrained random walk algorithm for identifying disease genes in multi-layer biomolecular network,called integrative Rank algorithm,which combines the multi-omics data and the corresponding multi-layer network by weighing,also adds the prior knowledge of genetic information,to construct the multi-layer specific molecular network.The information flow is also embedded into the algo-rithm in the form of constraint.Then,run integrative Rank algorithm to get the integrative Rank value of the nodes,further to measure the importance of nodes.This algorithm is used to identify the disease genes of hepatocellular carcinoma and prostate adenocarcinoma.The results show that integrative Rank algorithm can effectively identify disease genes,and the comparisons with other methods show its advantages,which provides a reference for integrating multi-omics da-ta and the corresponding multi-layer molecular networks for identifying disease genes,meanwhile prior knowledge is embedded into the algorithm in the form of constraint.(3)When identifying disease genes of complex diseases,there are heterologous multi-omics data.How to effectively integrate these data and molecular networks for identifying disease genes becomes a key point of research.Most existing meth-ods are based on integrating multiple single-layer network centrality,which may ignore the integrity of the data.This paper proposes a tensor-based random walk algorithm for identifying disease genes in high-dimensional molecular network,called tensor Rank algorithm,which combines heterologous multi-omics data and multi-attribute molecular network by weighing to construct a high-dimensional molecular network,represented in the form of tensor.Then,run tensor Rank algorithm to get the tensor Rank value of each node,further to measure the im-portance of nodes.tensor Rank algorithm is used to identify the disease genes of type ? diabetes mellitus and Alzheimer's disease.The results show that tensor Rank algorithm can effectively identify disease genes,the comparisons with other algorithms show its advantages,which upgrades the operation from matrix space to tensor space,gives a general method framework for identifying disease genes from multi-dimensional heterogeneous high-throughput omics data.(4)In biological networks,molecules generally form network modules or signal pathways to play their functions.Therefore,identifying network modules be-comes an important problem.Most existing methods are based on network clus-tering or gene set analysis,but few algorithm combines them together.This pa-per proposes a phenotype-driven module detection and ranking algorithm,called module Rank algorithm,for identifying disease modules in biomolecular network-s.The algorithm combines single omics data and single-layer molecular network by weighing to const.ruct disease-specific molecular network.This algorithm uses a guided module detection strategy,constructs network hypergraph with disease module as the node,and uses the hyper graph based PageRank algorithm to rank these modules.module Rank algorithm is used to identify hepatocellular carci-noma modules.Results analysis shows that module Rank algorithm can identify network modules related with disease,the comparisons with other methods show its advantages.module Rank algorithm integrates a phenotype driven-module detection technique,which extends the identification of single node pattern fea-tures to local subnetwork features.
Keywords/Search Tags:Biomolecular network, Disease gene identification, Random walk algorithm, Multi-omics data integration, Bioinformatics and machine learning
PDF Full Text Request
Related items