Prioritizing Disease-Causing Genes Based On Heat Diffusion Model

Posted on:2016-10-07

Degree:Master

Type:Thesis

Country:China

Candidate:M H Fang

Full Text:PDF

GTID:2284330464472632

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

In recent years, with the rapid development of bioinformatics and its application, people get vast amounts of biological data. How to analyze these data and dig out valuable information has become a hot topic in the field of bioinformatics. The rapid development of high-throughput biotechnologies provides a source of vast amount of data for disease-causing genes prioritization. The biological networks, such as protein interaction networks and disease phenotype similarity network well represents the complex relationship between genes and diseases, and provide support for disease-causing genes prioritization.Most computational strategies follow a "guilt-by-association" approach, where similar phenotypes are often caused by functional related genes, and genes associated with similar disorders have been shown to demonstrate higher probability of physical interactions between their gene products. Although these models achieved good results in terms of disease-causing genes prediction, there is still room for improvement. This paper mainly focuses on prioritizing disease candidate genes based on heat diffusion model and heterogeneous data. The main original works include:(1) Most extant disease-causing genes prioritization method tend to treat dangling gene (isolated gene) as network noise, therefore a dangling gene with no edges in the network cannot be effectively prioritized. These approaches tend to prioritize those genes that are highly connected in the PPI network while perform poorly when they are applied to loosely connected disease genes. To address these problems, we propose a new disease-causing genes prioritization method that based on network diffusion and rank concordance (NDRC). The method is evaluated by leave-one-out cross validation on 1931 diseases in which at least one gene is known to be involved. The experimental results suggest that NDRC significantly outperforms other existing methods such as RWR, VAVIEN and PRINCE on identifying loosely connected disease genes and successfully put dangling genes as potential candidate disease genes. Furthermore, we apply NDRC method to study three representative diseases, Meckel syndrome 1, Protein C deficiency and Peroxisome biogenesis disorder 1A (Zellweger). Our study has also found that certain complex disease-causing genes can be divided into several modules that are closely associated with different disease phenotype.(2) Due to the high-throughput biological data is far from perfect, and these data are reported to exhibit high false positive and false negative noise, so we cannot prioritize disease-causing genes well by only one biological data. We propose a new disease-causing genes prioritization method that based on network diffusion and heterogeneous data (NDHD) to improve the performance. NDHD integrates protein interaction networks, disease phenotype similarity network and protein domain network to predict disease-causing genes. The experimental results show that NDHD shows a slight advantage when compared to ProphNet.

Keywords/Search Tags:

disease-causing genes prioritization, dangling gene, loosely connected, modules, heterogeneous data

PDF Full Text Request

Related items

1	Reconstructs Rare Disease Classification With The Integration Of Systems-level Molecular Data And Phenotypic Data
2	Prioritization Of Genes Related To Nicotine Addiction
3	Prioritization Of Candidate Disease Genes Based On Topological Similarity And Optimized PPI Network
4	Predicting Disease Genes Based On Normalized Modules And Self-Adaptive Hopping Random Walk
5	Prioritization Of Candidate Disease Genes By Combining Topological Similarity With Semantic Similarity
6	Research Of Disease Genes Identification Based On Microarray Data
7	Gene Expression Profiles Analysis On Hepatocellular Carcinoma And Gallbladder Carcinoma; Study On The Single Nucleotide Polymorphism Of Genes In Hepatocellular Carcinoma
8	Modifier genes in the phenotypic manifestation of primary disease-causing mutations
9	Research On Genotype-phenotype Association Base On Memory Computing
10	Classification Of Gene Expression Data Based On Improved Salp Swarm Algorithm And Heterogeneous Integrated Learning