Font Size: a A A

The Research Of Parallel Edge-Betweenness Clustering Algorithm For Large Protein-Protein Interaction Network

Posted on:2014-01-31Degree:MasterType:Thesis
Country:ChinaCandidate:D P FanFull Text:PDF
GTID:2248330398469589Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The biomedical research is in the post-genome era currently. It is one of the most important problems to systematically analyze and comprehensively understand how the proteins accomplish the life activities by the interaction between each other. Specially, it is of great significance for predicting the function of unknown protein and explaining the specific biological processes by discovering the protein community from large-scale protein-protein interaction networks.Various clustering algorithms have been applied to discover the protein community with biological significance in protein-protein interaction networks. Among these algorithms, the algorithm based edge betweenness by Girvan and Newman is a concern because of its remarkable performances in discovering clustering structures in protein-protein interaction networks. Unfortunately, the high computing cost of computing the edge betweenness in Girvan and Newman’s clustering algorithm has been an obstacle to its use on relatively large protein-protein networks. With the development of science technology and the improvement of the experimental level, the data size of protein-protein interaction networks is larger and larger, it is unpractical to use Girvan and Newman’s Clustering algorithm to discover the protein community.It is effective to address this issue by applying parallel calculation method. In this paper, we used two parallel methods:MapReduce and MPI to design and implement the parallelization of the Girvan and Newman’s Clustering algorithm. The experimental results show that the parallel calculation method can achieve high performance by using MapReduce. MapReduce and MPI are both Message-Passing based method, compared to Memory-Shared based parallel method, it is effective with lower cost and better expansibility.
Keywords/Search Tags:Protein-Protein Interaction networks, Clustering, Parallel, Edge-Betweenness
PDF Full Text Request
Related items