Font Size: a A A

Design And Implementation Of Social Network Community Discovery Algorithm Based On Spark

Posted on:2019-01-11Degree:MasterType:Thesis
Country:ChinaCandidate:G WangFull Text:PDF
GTID:2428330545469968Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology and intelligent hardware technology,more and more people share their lives,make friends,and obtain information on social platforms,and build a social network with a large number of nodes and complex connection relationships.Therefore,it is of great practical significance to excavate the community structure in social networks.By mining the user's community attributes to understand the user's interests,hobbies,and needs,the user group can be targeted to accurately push information,so as to increase the value of social networking.Community discovery is an important branch of mining the relationships among nodes in social networks.The research on the overlapping community structure is the focus and difficulty of community discovery.In view of the disadvantages of the current overlapping community discovery algorithms and the large-scale social network structure,this article improves the overlapping community discovery algorithm COPRA(Community Overlap Propagation Algorithm)and combines the distributed computing framework Spark for parallelized design.The main work of this article is as follows:(1)An improved label propagation algorithm EPP-COPRA based on edge propagation probability is proposed.Firstly,according to the idea of node centrality,considering the influence of the degree of node's first-order neighbor and second-order neighbor on the node's ability to propagate,a node importance measure based on information entropy is proposed--Entropy Centrality.Secondly,based on the influential nodes,it also has strong label propagation capability.And the node has the ability to transmit labels and receive labels.It uses the entropy centrality of nodes and neighbors to calculate the influence of the edges to measure the label receiving capability of the nodes.Finally,by the similarity between the nodes and the edge influence measure,the edge propagation probability between the node and its neighbors is obtained and the randomness of the label propagation selection phase is avoided.(2)For large-scale social networks,using the advantages of the Spark distributed computing framework in memory computing,multi-iteration batch processing,and graph calculation,the EPP-COPRA algorithm is designed in parallel and deployed in a Spark cluster to improve the ability of the algorithm to handle community discovery in large-scale social networks.(3)Based on the Spark distributed computing framework,an efficient and convenient social network community discovery system was designed and implemented.The system components graph attribute calculation and community discovery algorithms,and uses workflow to perform sequence tasks discovered by the community,which greatly reduces the user's usage threshold and improves user interaction experience.
Keywords/Search Tags:community discovery, entropy centrality, label propagation, big data, Spark
PDF Full Text Request
Related items