The Design And Implementation Of Community Detect Algorithm Based On Spark For Large Scale And Complex Network

Posted on:2018-05-29

Degree:Master

Type:Thesis

Country:China

Candidate:D Y Yin

Full Text:PDF

GTID:2310330518496451

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In the realistic society, many complex systems are in the form of complex networks or can be converted into complex networks. Complex networks usually present community structures in which nodes connect densely and connections between communities are sparse. At present, the size of the social network of users has reached the level of billion, and explosive growth every day. Therefore, it is important to find the community result in the large-scale complex network for the theoretical research of network structure and the practical application of network analysis. This paper conducts researching for large-scale complex network based on Spark distributed computing framework, which mainly includes the following areas:In this paper, a novel community discovery algorithm, LinkSRHINK,is proposed based on the community discovery algorithm SHRINK and the concept of edge graph. This algorithm combines the algorithm based on density community discovery, modularity-based optimization and hierarchical clustering. This algorithm has advantages of deterministic detection results, accurate community structure with outliers and finding community structure avoiding excessive overlap. In addition,LinkSHRINK algorithm also proposed a new concept: community overlap,which can found community structure with different degrees of overlap.The experimental results based on Real-World Networks and Synthetic Networks demonstrate that our algorithm LinkSHRINK performs better than most of the traditional ones.Since the LinkSHRINK algorithm can not run normally in large-scale networks, this paper proposes a new algorithm, PLinkSHRINK, which solves this problem by using graph sampling and Spark distributed computing framework. As a contrastive algorithm, this paper also implements the algorithm of parallel LinkSRHINK based on Hadoop platform named MLinkSRHINK algorithm. Experiments show that the PLinkSHRINK algorithm outperforms the MLinkSRHINK algorithm and the LinkSHRINK algorithm.Finally, this paper constructs an efficient and convenient large-scale digging system BDAP based on the distributed computing framework. This system integrates the corresponding graph-attribute calculation algorithm and community discovery algorithm, and adopts the workflow pattern to interact with the user, which is convenient for users to use.

Keywords/Search Tags:

Overlapping community discovery algorithm, Spark, Parallel Computing, LinkSHRINK

PDF Full Text Request

Related items

1	Research And Parallelization Of Overlapping Community Discovery Algorithm Based On Local Extension
2	Study And Improvement Of Overlapping Community Discovery Based On Local Expansion
3	Research On Local Expanding Class Overlapping Community Discovery Algorithms
4	Research And Improvement Of Community Discovery Algorithm Based On Spark For Large Scale Complicated Networks
5	Spark Application Implementation On Multilayer Line Graph Mapping In Overlapping Communities
6	Research On GraphX-based Overlapping Community Discovery Algorithm
7	Research And Implementation Of Dynamic Overlapping Community Discovery Algorithm
8	Research And Implementation Of Algorithm For Complex Network Overlapping Community Structure Discovery
9	Overlapping Community Discovery Algorithm Based On Clique Graph Clustering
10	Parallel Computing Of Spark-based Geospatial Analysis Algorithms