Font Size: a A A

Clustering Algorithms And Its Application For Network Modularity Analysis

Posted on:2012-12-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:P G SunFull Text:PDF
GTID:1480303362452834Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of informationization of society, most of complex systems can be modeled as a graph model, which can be used to study complex systems and help us understand their functions. One significant characteristic of complex systems is modularity ? groups of vertices within which connection are dense, but between which are sparse. The study on this characteristic not only can help us understand the mechanism of complex systems, but also are great importance for the control, prediction, variety and development of complex systems.The main work and achievements of the author on network modularity analysis are systematically introduced in this dissertation.(1) We propose a novel method based on fuzzy clustering to detect community structure in complex networks. In contrast to previous studies, our method does not focus on a graph model, but rather on a fuzzy relation model, which uses the operations of fuzzy relation to replace a traversal search of the graph for identifying community structure. We map the communities as equivalence classes that satisfy a certain equivalence relation. We test our method on artificial networks and real-world networks with known communities, the results show that our method works well for detecting these communities, and can also be used for overlapping community structure detection.(2) We put forward an efficient method based on a minimum entropy transitive criterion for community structure detection in complex networks. Based on this criterion, an interaction graph is first denoted by a fuzzy relation, and the similarity entropy is proposed to determine the membership grade of the relation, and the lower the similarity entropy, the more similar the vertices. Then, the fuzzy relation is transformed into a minimal fuzzy relation by a novel composition rule of fuzzy relation that is presented here. Finally, this minimal fuzzy relation is partitioned into clusters based on the value of similarity entropy. The results both in artificial networks and real-world networks show that our criterion readjusts the distribution of similarity entropy between two vertices and makes it more convergent, which leads to the networks'hiberarchy more distinct, and our method based on this criterion is very efficient for community structure detection.(3) We bring out an iterative-clique percolation method (ICPM) for identifying overlapping modules in PPI (protein-protein interaction) networks. Our method is based on clique percolation method (CPM) which not only considers the degree of nodes to minimize the search space (The vertices in k-cliques must have the degree of k-1 at least), but also converts k-cliques to (k-1)-cliques. It uses (k-1)-cliques by appending one node to (k-1)-cliques for finding k-cliques. The results show that the time complexity declines quickly.(4) We present a method based on module alignment which integrates protein interaction and sequence information for finding conserved protein complexes. First, our method decomposes PPI networks into modules by module detection methods, and then identifies conserved complexes by module alignment based on sequence similarity between pairs of proteins from each of the species. The results show that the identified conserved modules have higher consistency in functional annotation, and the conserved modules detected from the modules decomposed by MCL outperform other algorithms, but some conserved modules can not be detected due to the decomposition of PPI networks.(5) Since genes associated with similar diseases/disorders show an increased tendency for their protein products to interact with each other through protein–protein interactions (PPI), clustering analysis obviously as an efficient technique can be easily used to predict human disease-related gene clusters/subnetworks. We use clustering algorithms, Markov cluster algorithm (MCL), Molecular complex detection (MCODE) and Clique percolation method (CPM) to decompose human PPI network into dense clusters, and then a log likelihood model that integrates multiple biological evidences is proposed to score these clusters. Finally, we identify disease-related clusters using these dense clusters if they have higher scores. The efficiency is evaluated by a leave-one-out cross validation procedure. We find that the clusters decomposed by CPM outperform MCL and MCODE as the candidates of disease-related clusters. We also find that most of the disease-related clusters consist of tissue-specific genes that are highly expressed only in one or several tissues, and a few of those are composed of housekeeping genes (maintenance genes) that are ubiquitously expressed in most of all the tissues.In conclusion, this paper focuses on the clustering algorithms for network modularity analysis and its application in real-world networks, in particular for the functional modules detection and disease-related clusters prediction in PPI networks. The clustering algorithms mentioned here can also be applied to other complex networks with similar structure.
Keywords/Search Tags:Clustering Algorithms, Modularity, Protein-Protein Interaction (PPI)
PDF Full Text Request
Related items