Font Size: a A A

Research On Parallel Label Propagation Community Discovery Method Based On Attribute And Link Relationship

Posted on:2024-09-12Degree:MasterType:Thesis
Country:ChinaCandidate:L Z HuangFull Text:PDF
GTID:2530307112477654Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the wide application of complex network research in social science,management science and engineering technology,community discovery has become an essential topic in complex network research.Community discovery can tap some implicit knowledge information and data laws in complex networks,thus providing people with ideas and methods to solve practical problems.Therefore,many community discovery algorithms for detecting the community structure of complex networks have appeared in recent years.Among them,the community discovery algorithm based on label propagation has the advantages of simple steps,near-linear time complexity and high efficiency,which occupies an essential position in the research of community discovery methods.However,this algorithm has some defects,such as solid randomness,poor robustness,easy-to-appear "monster community",and difficulty converging.In addition,with the advent of the significant data era,the number of nodes and connected edges in complex networks is increasing,which makes the network data set more and more massive.It is difficult for traditional community discovery algorithms to analyze it.(1)In order to solve the defects in the traditional label propagation algorithm,this study proposes a label propagation community discovery algorithm based on attribute and link relationship(AE-LPA),which firstly combines attribute and link information to weight preprocess the unweighted complex network;secondly,the label initialization phase,label propagation and update phase,and label iterative convergence phase of the traditional label propagation algorithm are proposed.The AE-LPA algorithm was developed on four network datasets with only link information and two with attribute and link information.The experiments are carried out on four network datasets with only link information and two network datasets with attribute information and link information.The modularity Q and normalized mutual information NMI are used as evaluation metrics to measure the community discovery effect.The experiments show that the AE-LPA algorithm can improve community discovery’s efficiency and division effect.(2)To achieve community discovery on large-scale complex networks,this study proposes a Spark-based parallel label propagation community discovery algorithm(SPAE-LPA)based on the AE-LPA algorithm in combination with Graph X,a graph computation module in the Spark distributed in-memory computing framework.The algorithm firstly implements a parallel extraction algorithm with rough kernels in Spark distributed in-memory computing framework;secondly,implements label propagation and update operations by using Graph X,a graph computing module in Spark;finally,sets the label difference signal to discern whether the label iteration converges or not as in the iterative convergence strategy of the AE-LPA algorithm.The SPAE-LPA algorithm is tested on five real networks and five.The experiments are conducted on five real networks and five LFR artificial benchmark networks.The experiments show that the SPAE-LPA algorithm outperforms the existing algorithms in terms of Q,NMI and operation efficiency,thus verifying the effectiveness and accuracy of the SPAE-LPA algorithm.
Keywords/Search Tags:Complex networks, Community discovery, Attribute information, Label propagation, Spark
PDF Full Text Request
Related items