Font Size: a A A

A New Community Discovery Algorithm Based On Label Propagation Algorithm

Posted on:2016-02-17Degree:MasterType:Thesis
Country:ChinaCandidate:X Y LiuFull Text:PDF
GTID:2348330488474530Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet in recent years, social networking sites established quickly. Including Facebook, Twitter, Google + abroad, QQ space, Douban, Ren Ren domestic are the most popular. There are a large number of users to use these social networking sites every day, sharing and producing large amounts of data, building new relationships, etc. For these data, it has the very high value, such as network marketing, public opinion analysis, etc. So how to deal with the data is especially important, including community discovery is a hot research topic. This article is mainly aiming at improving the community discovery algorithm which are often used.The main idea of most commonly used Label propagation algorithm of LPA is in the original case, each node in the network is initialized to the unique set of labels. Then in an iterative update the label of each node, update label of the node is depend on the number of labels in its neighbor nodes, if the most number of label is not the only one, then select a label to update the current node randomly. The algorithm doesn't stop until the node achieve convergence or the algorithm shocks. As the ideas and implementation of processing, causing the algorithm has lots of shortcomings, such as instability, and finding either giant communities or meaningless small communities, distribution of communities is extremely uneven and the structure of network is more sensitive, in the binary network circumstances circulation the algorithm shock will happen. For the LPA algorithm of series faults, a new community discovery algorithm proposed in this paper is LAAPA(label-attribute&attenuation progagation algorithm), which is based on the label attributes and attenuation factor. The algorithm introduced transmission attenuation factor and node properties, transmission attenuation factor as another words is the label of node transmission distance is limited, and with the increase of distance of transmission the influence of the node's label reduce gradually, updating the label is not necessary.The attribution of node refers to the relationship between the nodes in the network and is not just a connect with other nodes which focus on the simple traditional mean of edge. In order to match the actual situation in this paper, we propose the node properties which specifically refers to the node's other properties such as in douban site users join in the "team" as the properties of the nodes, the same attributes will be reflected to the edge of the nodes which connect to each other. In the iteration of algorithm, the node updating label will consider the label propagation distance of neighbor node, the weights of the edge, the degrees of nodes. In this paper, LAAPA algorithms and LPA algorithm deal with two sets of standard data for comparing. LAAPA algorithm is more effective than LPA algorithm in the community size, module degrees.By using Scrapy we fetch data over the site of douban. After cleaning and formatting the data was verified the characteristics of the social network, "small world", centricity of node. By using two kinds of community discovery algorithm to find community, the result also finding that the quality of the community that LAAPA algorithm found is more effective, more quality, more stability, and community distribution is more uniform and no presence of the giant community.
Keywords/Search Tags:community detection, Label propagation algorithm, LAAPA, social network
PDF Full Text Request
Related items