On Load-Balancing For Parallel And Distributed Overlapping Community Metrics Computation

Posted on:2020-01-13

Degree:Master

Type:Thesis

Country:China

Candidate:Y S Wu

Full Text:PDF

GTID:2370330599954707

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Networks in the real world have an implicit community structure,which divides the vertices in the network into multiple different groups,each of which is a community.In the overlapping communities,a vertex can belong to more than one community.Analyzing overlapping community structures can reveal networks' hidden meaningful structures and functionality.Different overlapping community algorithms can detect different community structures and overlapping community metrics(OCM)can be used to evaluate the accuracy of the corresponding algorithms.With the advent of the era of big data and the rapid growth of data scale,the number and size of overlapping community structures in large-scale networks have grown tremendously.The high complexity of computational complexity present great challenges to existing serial algorithms.Meanwhile,data partition between nodes in a cluster has a critical impact on the performance of parallel distributed algorithms.An efficient parallel and distributed OCM computation requires an appropriate data partition for balancing the workload among nodes.However,the existing OCM computation algorithms are serial algorithms,whose computation efficiency can be improved.This paper designs a load-balancing parallel distributed algorithm for overlapping community quality metrics computation.The overlapping community metrics can be used to evaluate the quality of detected overlapping communities and optimize the detection algorithms.In this paper,we first present a parallel and distributed algorithm(P&D)for computing OCM.Then,the existing data partitioning strategies are summarized to show how they can be applied for OCM,and a greedy partitioning strategy based on load estimation(Greedy-based)is proposed to balance the load between each node.Finally,an empirical comparison study is conducted to evaluate the performance of the proposed algorithm and data partition strategy with five real massive networks' datasets.The experimental results show that compared with the existing Hash-based and Range-based partitioning strategies,the load-based greedy grouping strategy proposed in this paper achieves better performance in memory,communication and CPU load distribution and overall execution time.This verifies that the grouping strategy proposed in this paper can achieve load balancing between nodes in parallel distributed algorithms compared with existing strategies.At the same time,parallel distributed algorithms perform better than existing serial algorithms(including Clubmark and Mutual3)in terms of execution time,speedup,and scalability.The results show that the proposed parallel and distributed algorithms achieve an improvement ranging from 10.7% to 89.4% in execution time and achieve an acceleration of 1.12 to 9.43.

Keywords/Search Tags:

Overlapping Community Metrics, Parallel and Distributed Computing, Load Balancing, Data Partition

PDF Full Text Request

Related items

1	Research On Decentralized Load Balancing Method For Distributed Spatiotemporal Database
2	Research On Graph Partitioning Algorithms In Distributed Environment
3	Research On Partition Methods On Large-Scale Dynamic Graph Based On Multi-Level Division
4	Distributed Computing Technology Study And Implementation
5	Efficient Storage And Parallel Overlay Analysis Of Massive Vector Data In The Cloud Computing Environment
6	Research On Load Balancing Of Application Server On Distributed GIS
7	Load Balancing Algorithm In The Large-scale CFD Applications With Multi-zone Structured Grids
8	Hybrid Parallel Neighbor-related Computing For Vector Point And Raster Big Data Based On Heterogeneous CPU-GPU Systems
9	Research And Implementation On Distributed Partition Algorithm Based On Heuristic Of Large Graph Data
10	Research On Graph Partitioning In Distributed Graph Computing