Font Size: a A A

Prioritizing Disease Candidate Genes Based On PPI Network

Posted on:2015-06-21Degree:MasterType:Thesis
Country:ChinaCandidate:Q LiFull Text:PDF
GTID:2180330434954083Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Identification of disease-causing genes is a key problem of human genetics research. The traditional gene positional cloning strategies can restrict the location of the disease gene to a region that may contain tens to hundreds of candidate genes. It is time-consuming and laborious to validate these candidate genes one by one through biological experiments. However, using bioinformatics methods can not only reduce costs, but also can quickly and effectively identify disease candidate genes. Among them, Prioritization method based on PPI network show a better performance. This paper mainly focuses on prioritizing disease candidate genes based on PPI network and its main research results list as below:Firstly, we developed a shortest path-based algorithm, named SPranker, to prioritize disease candidate genes in protein interaction network. Considering the fact that diseases with similar phenotypes are generally caused by functional related genes, we further proposed a new algorithm SPGOranker by integrating the semantic similarity of GO annotations. SPGOranker not only considers the shortest path between protein pairs in a protein interaction network but also takes their GO semantic similarity into account. The proposed algorithms SPranker and SPGOranker were applied to1598known orphan disease-causing genes (ODGs) from172orphan diseases (ODs). The proposed algorithms were compared with three state-of-art approaches, ICN, VS and RWR. The experimental results show that SPranker and SPGOranker outperform ICN, VS and RWR for the prioritization of orphan disease-causing genes. We further apply our methods to identify and rank potential novel candidate genes for several ODs.Secondly, we proposed an algorithm based on the search engine ranking method, named TrustRanker, to prioritize disease candidate genes. We constructed a bipartite graph consisting of two disjoint sets of nodes which named diseasome. Starting from the diseasome bipartite graph we generated two biologically relevant network projections, human disease network (HDN) and disease gene network (DGN). We also analyzed the topology characteristics of the two networks. We used two type of similarity between two diseases, topological similarity and disease phenotype similarity, to select genes associated with specific diseases as seeds. Using these seed genes initialize start probability matrix of TrustRanker. We test our method on gene-disease association data, evaluating the prioritization achieved. Using data on2666disease from the OMIM knowledgebase, we perform large-scale cross validation to rank the candidate genes and also evaluate and compare the performance of our approach. Our results show that our method outperforms Prince and PRP. Importantly, we apply our method to study three multi-factorial disease for which some causal genes have been found already.
Keywords/Search Tags:disease candidate genes, protein interaction network, prioritization, TrustRanker
PDF Full Text Request
Related items