Font Size: a A A

Research On Community Discovery Method Based On Optimized Label Propagation Algorithm

Posted on:2024-07-29Degree:MasterType:Thesis
Country:ChinaCandidate:J Y YangFull Text:PDF
GTID:2530307055975159Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Social network is a relatively stable relationship system formed by interaction among individual members of society,and community discovery is one of the important tasks of social network analysis.Mining the community structure in social networks helps to understand the organizational characteristics and connection patterns of real networks,discover hidden values ??and reveal potential laws,and then help explain some social phenomena.Community discovery has important theoretical research significance and application value in many fields such as intelligent recommendation,information dissemination,and precision marketing.With the continuous improvement of informatization,traditional community discovery algorithms are difficult to effectively deal with the increasing scale and complexity of social networks.Exploring more efficient algorithms to achieve higher quality community discovery has become a research hotspot for scholars.The label propagation algorithm is a classic community discovery method,which has many advantages such as simplicity,high efficiency,low complexity,no need to specify the number of communities,and is suitable for large-scale data sets.However,it also has disadvantages such as unstable division results and low accuracy.Based on this,this paper studies the label propagation algorithm around node influence and node similarity,and proposes two improved algorithms to achieve more efficient,stable and accurate community discovery.The main research content of this paper is as follows:1.Aiming at the problem of poor stability and low accuracy of community division results caused by the randomness of node selection and label update in traditional label propagation algorithms,a label propagation community discovery algorithm that integrates seed node influence and neighborhood similarity is proposed.First,the K-shell value of neighbor nodes is fused with the aggregation coefficient to define node influence,and the initial seed set is screened by threshold,and then the nodes with less influence among adjacent nodes are removed to obtain the final seed set,so as to improve the selection of seed nodes Inaccurate community division and high computational complexity caused by improper or too many initial labels.Secondly,the connection strength between the non-seed node and the seed node is defined based on their own weight,distance weight and common neighbor weight,and the label of the non-seed node is updated to the label of the seed node with the maximum connection strength.Furthermore,for the situation that the connection strength between non-seed nodes and multiple seed nodes is the same,a new neighborhood similarity is proposed by fusing the information between the two types of nodes and their neighbor nodes,and it is used as the basis for label update of non-seed nodes.This reduces the problem that the random selection of the labels of the seed nodes leads to the inconsistency of the community division results obtained by the algorithm running multiple times.2.Aiming at the problem that the community division results of the classic two-stage label propagation algorithm LPA-TS are unstable and easy to produce small-scale communities,an improved two-stage label propagation algorithm based on LeaderRank is proposed.In the first stage,the update order of nodes is first determined by the participation coefficient,and then a new node similarity is defined to improve the label selection mechanism,aiming to solve the vibration problem caused when the node with the highest similarity with the current node contains multiple labels,and at the same time integrates nodes The influence value gets the initial community structure.In the second stage,the rough community obtained in the first stage is regarded as a node,and its merging order is also determined by the participation coefficient.degree as the goal to further optimize the community structure,so as to obtain the final division result.For the above algorithm,a large number of comparative experiments were conducted on 9 real social networks and 19 artificial data sets with different sizes and complexities,compared with dozens of classical community discovery algorithms and improved label propagation algorithms.The modularity and NMI values obtained from multiple runs of the algorithm were used as evaluation indexes for the stability of community partitioning results and community quality.A large number of experimental comparison results show that the community division results obtained by the proposed algorithm are consistent in the social network data sets of different sizes and complexity,and the modularity and NMI values are generally higher than other algorithms,which verifies the stability,effectiveness and higher quality of the community discovery results of the proposed algorithm.
Keywords/Search Tags:Label propagation algorithm, community discovery, node influence, seed node, neighborhood similarity, LeaderRank
PDF Full Text Request
Related items