Font Size: a A A

Modular Structure Identification In Protein Interaction Networks

Posted on:2012-10-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:L YuFull Text:PDF
GTID:1118330338450242Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Identification of modular structures in protein interaction networks is the first step in understanding the organization and dynamics of cell functions. Analyzing the topological characteristics of protein interaction networks and on the basis of its modularity, identifying protein complexes and functional modules, predicting the functions of unknown proteins, and predicting new functions of known proteins are becoming the hot issues in the domestic and overseas researches.Based on the topological characteristics of protein interaction networks, several effective algorithms are proposed for the detection of protein complexes or functional modules. Further, the work is extended to complex networks. The main original works include:Based on MCODE (Molecular Complex Detection) and GN (Girvan and Newman) algorithms, a new algorithm is proposed to identify protein complexes. For MCODE and GN algorithms, we first analyze their advantages and disadvantages. Then, according to the topology of protein networks, a novel algorithm Combining MCODE with GN is presented. Our algorithm can accurately discover denser modules in large-scale protein interaction networks. We apply our clustering algorithm to S.cerevisiae protein interaction networks. We obtain a high matching rate between predicted modules and known protein complexes in MIPS (Munich Information Center for Protein Sequences). The results show that the new algorithm identifies many well-known protein complexes.A further study on the organization of protein complexes demonstrated that a protein complex generally should contain a core and attachments. According to the properties of protein cores and attachments, a method is presented based on local density and random walks (LDRM) for core-attachment complexes detection. Our LDRM method contains of two stages. Firstly, it finds all the protein-complex cores based on local density of subnetworks. Then it uses random walks with restarts for finding the attachment proteins of each detected core to form complexes. We evaluate the effectiveness of our method using a weighted and an unweighted yeast protein interaction networks. We validate the biological significance of the predicted protein complexes using known complexes in MIPS and GO (Gene Ontology) databases. We also perform a comprehensive comparison between our method and other existing methods. The results show that our method can find more protein complexes with high biological significance and obtain a significant improvement. Furthermore, our method is able to identify biologically significant overlapped protein complexes.According to the properties of maximal frequent patterns and protein cores, we present a method for core-attachment complexes identification in protein interaction networks. Firstly, we detect subgraphs with high degree as candidate protein cores by mining maximal frequent patterns. Then using topological and functional similarities, we combine highly similar protein cores and filter insignificant ones. Finally, the core-attachment complexes are formed by adding attachment proteins to each significant core. We experimentally evaluate the performance of our method CCiMFP on yeast protein interaction networks. Using gold standard sets of protein complexes, GO, and localization annotations, we show our method gains an improvement over previous algorithms in terms of precision, recall, and biological significance of the predicted complexes. The colocalization scores of our predicted complex sets are higher than those of two known complex sets. Moreover, our method can detect GO-enriched complexes with disconnected cores compared with other methods based on subgraph connectivity.Based on the within-module and between-module edges of subgraphs and average degree, we present a formal module definition in protein interaction networks. Using the new module definition, an effective quantitative measure is introduced for evaluating the partition of a protein interaction network. As a further application of the measure, a hierarchical agglomerative clustering algorithm is developed for complexes detection within protein interaction networks. We use gold standard sets of protein complexes to validate the biological significance of predicted complexes. A comprehensive comparison is performed between our method and other four representative methods. The new method is robust to the known high rate of false positives and false negatives in data from high-throughput interaction techniques. Thus it is used in protein interaction networks even with high false positives and high false negatives to identify new protein complexes. Furthermore, the complexes, dense or sparse, predicted by our method match well with known biological characteristics.Based on local connectivity of communities, we propose a novel quantitative function for partitioning networks into communities. Recently, modularity function Q has been widely used as a measure to identify communities in complex networks. However, optimizing Q function has some resolution limitations. Thus, we present a new quantitative function DQ (degree modularity). We prove that the function DQ can improve the resolution limitations of modularity Q. Furthermore, we experimentally evaluate the performance of the new quantitative function using a variety of real and computer-generated networks. Communities of widely differing sizes can be detected with higher sensitivity and reliability. Also, even in large-scale biological networks, such as protein interaction networks, we can obtain higher matching rate between predicted protein modules and known protein complexes.The clustering algorithms proposed in this paper start off from different sights and solve some problems effectively in the processes of clustering in protein interaction networks. Also, our work is extended to complex networks. The proposed clustering algorithms have good clustering performances. The identified protein complexes or functional modules are proved to be statistically significant. These results will provide references for biologists in protein complexes or functional modules identification experiments and their further research. In addition, the proposed quantitative function for partitioning networks into communities shows a good performance on real and computer-generated networks.
Keywords/Search Tags:Protein interaction network, Modular structure, Clustering, Protein complex, Functional module
PDF Full Text Request
Related items