Font Size: a A A

Analysis And Evaluation Of Clustering Algorithm For The Protein Interaction Networks

Posted on:2014-07-11Degree:MasterType:Thesis
Country:ChinaCandidate:X H WuFull Text:PDF
GTID:2250330425474904Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the post-genome era, with the development of high-thoughput proteomics, the obtained protein-protein interaction data increase rapidly. An important challenge is to systematically analyze and comprehensively understand how the proteins accomplish the life activities by interacting with each other.In this thesis, we start with the topology characteristics of the protein network and integrate relevant biological data to achieve the analysis and evaluation of clustering algorithm for the protein interaction networks. The main research contents and contributions are as follows:An algorithm named IPCIPG, which integrates the protein interaction data and gene expression data, is proposed for identifying protein complexes. It can identify both the overlapping and non-overlapping complexes from the protein interaction network. The experiment results show that IPCIPG can match more known complexes than that by HUNTER、HC-PIN、SPICI、CMC、MCODE and MCL.A new model is proposed to identify and distinguish protein complexes and functional modules from the protein interaction network, and two clustering algorithms, named TSN-PCD and DFM-CIN are applied to identify protein complexes and functional modules from this model separately. The experiment results indicate that TSN-PCD can identify protein complexes more accurately than several other algorithms. The analysis of function enrichment with the annotation information from GO database also indicates that DFM-CIN identify most functional modules that are involved in specific biological process.A new evaluation method named hF-measure is proposed with considering the characteristics of protein function hierarchy. Compared to the method F-measure, hF-measure further considers the functional similarity among the proteins in the complexes with the annotation information from GO database, and the complexes’s topological characteristics. The experimental results based on artificial test data and clustering results both show that hF-measure can not only evaluates identified protein complexes accurately but also capture the variation of the protein complex’s topology.At last, we design and implement a new visual analysis platform, which is named ClusterE. ClusterE is a scalable platform, and a series of clustering algorithms and evaluation methods are implemented as plug-ins. So far, seven clustering algorithms such as IPCIPG, and ten evaluation methods such as hF-measure have been implemented in ClusterE. We can get a fast and effective evaluation or comparision for one or several clustering algorithms with the charts and tables offered in ClusterE.
Keywords/Search Tags:bioinformatics, protein interaction network, clusteringalgorithm, protein complex, functional module
PDF Full Text Request
Related items