Font Size: a A A

Prioritizing Disease-Causing Genes Based On Heat Diffusion Model

Posted on:2016-10-07Degree:MasterType:Thesis
Country:ChinaCandidate:M H FangFull Text:PDF
GTID:2284330464472632Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid development of bioinformatics and its application, people get vast amounts of biological data. How to analyze these data and dig out valuable information has become a hot topic in the field of bioinformatics. The rapid development of high-throughput biotechnologies provides a source of vast amount of data for disease-causing genes prioritization. The biological networks, such as protein interaction networks and disease phenotype similarity network well represents the complex relationship between genes and diseases, and provide support for disease-causing genes prioritization.Most computational strategies follow a "guilt-by-association" approach, where similar phenotypes are often caused by functional related genes, and genes associated with similar disorders have been shown to demonstrate higher probability of physical interactions between their gene products. Although these models achieved good results in terms of disease-causing genes prediction, there is still room for improvement. This paper mainly focuses on prioritizing disease candidate genes based on heat diffusion model and heterogeneous data. The main original works include:(1) Most extant disease-causing genes prioritization method tend to treat dangling gene (isolated gene) as network noise, therefore a dangling gene with no edges in the network cannot be effectively prioritized. These approaches tend to prioritize those genes that are highly connected in the PPI network while perform poorly when they are applied to loosely connected disease genes. To address these problems, we propose a new disease-causing genes prioritization method that based on network diffusion and rank concordance (NDRC). The method is evaluated by leave-one-out cross validation on 1931 diseases in which at least one gene is known to be involved. The experimental results suggest that NDRC significantly outperforms other existing methods such as RWR, VAVIEN and PRINCE on identifying loosely connected disease genes and successfully put dangling genes as potential candidate disease genes. Furthermore, we apply NDRC method to study three representative diseases, Meckel syndrome 1, Protein C deficiency and Peroxisome biogenesis disorder 1A (Zellweger). Our study has also found that certain complex disease-causing genes can be divided into several modules that are closely associated with different disease phenotype.(2) Due to the high-throughput biological data is far from perfect, and these data are reported to exhibit high false positive and false negative noise, so we cannot prioritize disease-causing genes well by only one biological data. We propose a new disease-causing genes prioritization method that based on network diffusion and heterogeneous data (NDHD) to improve the performance. NDHD integrates protein interaction networks, disease phenotype similarity network and protein domain network to predict disease-causing genes. The experimental results show that NDHD shows a slight advantage when compared to ProphNet.
Keywords/Search Tags:disease-causing genes prioritization, dangling gene, loosely connected, modules, heterogeneous data
PDF Full Text Request
Related items