Font Size: a A A

On Load-Balancing For Parallel And Distributed Overlapping Community Metrics Computation

Posted on:2020-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y S WuFull Text:PDF
GTID:2370330599954707Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Networks in the real world have an implicit community structure,which divides the vertices in the network into multiple different groups,each of which is a community.In the overlapping communities,a vertex can belong to more than one community.Analyzing overlapping community structures can reveal networks' hidden meaningful structures and functionality.Different overlapping community algorithms can detect different community structures and overlapping community metrics(OCM)can be used to evaluate the accuracy of the corresponding algorithms.With the advent of the era of big data and the rapid growth of data scale,the number and size of overlapping community structures in large-scale networks have grown tremendously.The high complexity of computational complexity present great challenges to existing serial algorithms.Meanwhile,data partition between nodes in a cluster has a critical impact on the performance of parallel distributed algorithms.An efficient parallel and distributed OCM computation requires an appropriate data partition for balancing the workload among nodes.However,the existing OCM computation algorithms are serial algorithms,whose computation efficiency can be improved.This paper designs a load-balancing parallel distributed algorithm for overlapping community quality metrics computation.The overlapping community metrics can be used to evaluate the quality of detected overlapping communities and optimize the detection algorithms.In this paper,we first present a parallel and distributed algorithm(P&D)for computing OCM.Then,the existing data partitioning strategies are summarized to show how they can be applied for OCM,and a greedy partitioning strategy based on load estimation(Greedy-based)is proposed to balance the load between each node.Finally,an empirical comparison study is conducted to evaluate the performance of the proposed algorithm and data partition strategy with five real massive networks' datasets.The experimental results show that compared with the existing Hash-based and Range-based partitioning strategies,the load-based greedy grouping strategy proposed in this paper achieves better performance in memory,communication and CPU load distribution and overall execution time.This verifies that the grouping strategy proposed in this paper can achieve load balancing between nodes in parallel distributed algorithms compared with existing strategies.At the same time,parallel distributed algorithms perform better than existing serial algorithms(including Clubmark and Mutual3)in terms of execution time,speedup,and scalability.The results show that the proposed parallel and distributed algorithms achieve an improvement ranging from 10.7% to 89.4% in execution time and achieve an acceleration of 1.12 to 9.43.
Keywords/Search Tags:Overlapping Community Metrics, Parallel and Distributed Computing, Load Balancing, Data Partition
PDF Full Text Request
Related items