| With the continuous development of social networks,community detection has become an important research hotspots in the complex network field.A complete network consists several communities.The connection between nodes is relatively close within the community and the connection between nodes in different communities is relatively loose.Label propagation algorithm LPA is an excellent algorithm in community detection.Its linear time complexity is a great advantage.Although LPA has a lot of advantages,but the shortcomings are also very obvious.Because of the random selection of labels,LPA cannot guarantee the consistency of every result.In addition,after repeated iterations,there may be a phenomenon of large communities swallowing small communities.In combination with the above problems,two algorithms are improved on the basis of LPA,and the specific research results are as follows:(1)Optimization and improvement of LPALPA does not contain any parameters,we mainly optimize the label propagation and label update.In PSLPA(Probability and Similarity based Label Propagation Algorithm),we combine the probability of label propagation and similarity between nodes,more,an adaptive label selection is utilized to update node labels in the process of label propagation.In WRWLPA(Weight and Random Walk based Label Propagation Algorithm),we propose a new similarity calculation method by combining the weight and random walk,weight and similarity are used to update labels in the stage of label propagation.These two algorithms have excellent performance in accuracy and stability.(2)ParallelizationFor the two algorithms mentioned above,we all realized parallelization.The GraphX module is used under the Spark platform.The algorithm process is transformed into the iterative computation process of network graph which is transformed through the existing API interface.For the label propagation process,a custom function is implemented to complete the parallelization of the algorithm.The parallel algorithm shows high accuracy and stability on different scale datasets. |