Font Size: a A A

Identification And Application Of Protein Complexes In Protein Interaction Networks

Posted on:2014-05-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:J RenFull Text:PDF
GTID:1260330401979011Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With protein-protein interaction networks (PPI networks) being completely, many researchers focus on identifying protein complexes in large-scale PPI networks. However, due to the diversity of topological structure of protein complex, the hierarchical structure and overlapping of protein complexes, and many other problems, there are still many challenges to identify protein complexes accurately and efficiently in PPI networks. Pointed to these problems, we focus on identifying and apply-ing protein complexes in PPI networks. The main original work includes:1) Identifying protein complexes with various topological structures: As many evidences have indicated that dense subgraphs or modules in PPI networks usually correspond to protein complexes, protein complexes detection methods based on PPI network generally focused on identifying dense subgraphs only or modules only in PPI network. However, the density-based methods are difficult to mine protein complexes with low density, which generally correspond to modules in PPI network. The modularity-based methods are difficult to mine protein complexes with low modularity, which generally correspond to dense subgraphs in PPI network. To identify protein complexes with various topological structures, including those are modules but with low density and those have high density but not modules, a novel fitness is proposed by considering both the density and the modularity of a subgraph, and a novel algorithm, named LF-PIN, is developed to identify protein complexes by expanding seed edges to subgraphs with the maximum value of fitness in PPI network. Experimental results show that compared with seven other competing methods (CMC, Core-Attachment, CPM, DPClus, HC-PIN, MCL, and NFC), LF-PIN identifies known protein complexes more effectively, especially for protein complexes with low density or with low modularity.2) Identifying hierarchical and overlapping protein complexes:In biological organisms, protein complexes are hierarchical and overlapping. In protein complexes detection methods, only hierarchical clustering methods can detect the hierarchical structure of protein complexes. As original clusters of these methods are un-overlapping proteins, protein complexes identified by them are not overlapping. To overcome this limitation, we propose two novel protein complex detection algorithms: OH-PIN and MCSE. OH-PIN is a hierarchical clustering method, so it can detect the hierarchical structure of protein complexes naturally. As its original clusters are overlapping, the identified protein complexes based on the overlapping original clusters are also overlapping. MCSE is a method based on "seed-expanding". As a protein can be visited by different seeds and added into their clusters, MCSE can identify overlapping protein complexes naturally. As MCSE uses the parameter λ to control the expanding range of seeds, it can identify protein complexes of different levels and detect a hierarchical organization of protein complexes by tuning the value of λ. Experimental results of S.cerevisiae show that the hierarchical organizations detected by the two methods are similar to the hierarchical organization of GO annotations and that of the known complexes in MIPS. Compared with other competing methods, both OH-PIN and MCSE identify known protein complexes more effectively, especially for protein complexes in the high levels. Compared MCSE and OH-PIN, OH-PIN has better performance in small and confident PPI network, MCSE runs much more fast than OH-PIN and is more suitable for identifying protein complexes in large PPI network.3) Identifying protein complexes by integrating PPI network and other biological data:Many evidences have indicated that the possibility of a PPI in a protein complex has high correlation with some of its biological characters. To improve the prediction accuracy of protein complexes, a method of integrating multi-data, named MD-WPIN, is proposed. By using logistic regression model to evaluate the effect of a PPI’s essentiality, the cellular localization of its two proteins, its edge clustering value in PPI network, and its reliability on the possibility of the PPI to be in a protein complex, MD-WPIN integrates these four biological data and establish a weighted PPI network of S.cerevisiae, named YDIPW+. We apply several protein complexes detection methods, including LF-PIN and MCSE, to the un-weighted PPI network of S.cerevisiae, other weighted PPI network of S.cerevisiae, and YDIPW+and compare their performance. Experimental results show that all algorithms have the best performance when running on the YDIPW+4) Using protein complexes to identify essential proteins: Identifying essential proteins is important for discovering disease genes and establishing drug target. Based on the discovery that essential proteins have high correlation with protein complexes, we define a novel measure, named Complex_C, to identify essential proteins based on protein complexes. Experimental results show that Complex_C and topological centrality are all important predictor factors for protein essentiality and they are complementary to each other. Thus, a novel measure, named HC, is proposed by integrating Complex_C and subgraph centrality (SC). To improve the performance further, we integrate two other predictor factors, protein’s cellular localization and biological process, into PPI network to construct a weighted PPI network, extend HC measure as HCW measure and apply HCW measure to the weighted PPI network for identifying essential proteins. Experimental results of S.cerevisiae show that, when top5%to top25%proteins are selected as candidate essential proteins, compared with the best results of six centrality measures, the numbers of essential proteins identified by HC are improved from9.1%to15.2%; compared with results of HC, those identified by HCW are improved from4.2%to11.5%.The methods proposed in this paper start off from different sights to solve some problems in the identification of protein complexes effectively, and improve the performance. For example, LF-PIN points to identifying protein complexes with various topological structure, OH-PIN and MCSE aim to identifying hierarchical and overlapping protein complexes, and MD-WPIN integrate other biological data into PPI network. At last, we use the protein complexes for identifying essential proteins, which provides a new idea for the identification of essential proteins.
Keywords/Search Tags:protein-protein interaction network, clustering algorithm, protein complex, essential protein
PDF Full Text Request
Related items