Font Size: a A A

The Design And Implementation Of Community Detect Algorithm Based On Spark For Large Scale And Complex Network

Posted on:2018-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:D Y YinFull Text:PDF
GTID:2310330518496451Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the realistic society, many complex systems are in the form of complex networks or can be converted into complex networks. Complex networks usually present community structures in which nodes connect densely and connections between communities are sparse. At present, the size of the social network of users has reached the level of billion, and explosive growth every day. Therefore, it is important to find the community result in the large-scale complex network for the theoretical research of network structure and the practical application of network analysis. This paper conducts researching for large-scale complex network based on Spark distributed computing framework, which mainly includes the following areas:In this paper, a novel community discovery algorithm, LinkSRHINK,is proposed based on the community discovery algorithm SHRINK and the concept of edge graph. This algorithm combines the algorithm based on density community discovery, modularity-based optimization and hierarchical clustering. This algorithm has advantages of deterministic detection results, accurate community structure with outliers and finding community structure avoiding excessive overlap. In addition,LinkSHRINK algorithm also proposed a new concept: community overlap,which can found community structure with different degrees of overlap.The experimental results based on Real-World Networks and Synthetic Networks demonstrate that our algorithm LinkSHRINK performs better than most of the traditional ones.Since the LinkSHRINK algorithm can not run normally in large-scale networks, this paper proposes a new algorithm, PLinkSHRINK, which solves this problem by using graph sampling and Spark distributed computing framework. As a contrastive algorithm, this paper also implements the algorithm of parallel LinkSRHINK based on Hadoop platform named MLinkSRHINK algorithm. Experiments show that the PLinkSHRINK algorithm outperforms the MLinkSRHINK algorithm and the LinkSHRINK algorithm.Finally, this paper constructs an efficient and convenient large-scale digging system BDAP based on the distributed computing framework. This system integrates the corresponding graph-attribute calculation algorithm and community discovery algorithm, and adopts the workflow pattern to interact with the user, which is convenient for users to use.
Keywords/Search Tags:Overlapping community discovery algorithm, Spark, Parallel Computing, LinkSHRINK
PDF Full Text Request
Related items