Font Size: a A A

Research On Large-scale Complex Network Community Detection Algorithm Based On Spark

Posted on:2020-03-10Degree:MasterType:Thesis
Country:ChinaCandidate:S J HuoFull Text:PDF
GTID:2518306464994989Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Many systems in nature and human society can be represented by complex network models and community structure is the most common and important topological attributes in complex networks.The community is a collection of nodes in a complex network and the interactions between the internal nodes of the collection are stronger than their interactions with external nodes.The number of network users in the era of big data is exploding,the variety and scale of complex networks have been developed with each passing day.Therefore,detecting community structure in large-scale complex networks has important significance for theoretical value research and practical application.Based on the Spark distributed computing framework,this paper studies large-scale complex networks and mainly has the following aspects of work:(1)Firstly,the relationship between the maximal clique and the community structure is studied.It is found that the maximal clique structure has a high probability of belonging to the same community.Therefore,label initialization strategy based on maximal clique structure is introduced to the synchronous label propagation algorithm to improve the convergence speed and accuracy.Then,the update rule based on community similarity is proposed to solve the problem of low accuracy of the original synchronous label propagation algorithm,it guides the label update process by considering node centrality and community similarity and improves the accuracy of the synchronous label propagation algorithm significantly.Finally,introducing the maximal clique structure and the update rules based on community similarity to the synchronous label propagation algorithm,label propagation community detection algorithm based on the maximal clique structure,namely CSS?LPA,is proposed.Experiments are carried out through small-scale real network datasets and small-scale LFR benchmark network datasets,the results show that the community detection accuracy of CSS?LPA algorithm is higher and the convergence speed is faster by comparing iteration times,modularity and normalized mutual information.(2)In order to implement community detection on large-scale complex networks,a parallel label propagation community detection algorithm based on the maximal clique structure,namely CSPS?LPA,is proposed which implements the CSS?LPA algorithm on the Spark platform.Firstly,the maximal clique parallel mining algorithm is implemented on the Spark platform.Then,a parallel label propagation community detection algorithm based on community similarity is formed.It introduces the update rules based on community similarity into the synchronous label propagation algorithm,and is implemented on Graph X graph computing framework of the Spark platform.Experiments are carried out through large-scale real network datasets and large-scale LFR benchmark network datasets,the results show that the community detection accuracy of CSPS?LPA algorithm is higher and it contains good scalability on the distributed computing platform by comparing modularity,normalized mutual information and speedup.
Keywords/Search Tags:Large-scale Complex Network, Spark, Community Detection, Label Propagation, The Maximal Clique
PDF Full Text Request
Related items