Font Size: a A A

Design And Implementation Of Community Detection Algorithm Based On Spark

Posted on:2017-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:Z LiFull Text:PDF
GTID:2308330488993968Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid development of Internet technology, mobile terminal and mobile Internet, more and more people become users of the Internet. As the most popular Internet applications, social networks are becoming popular. With the continuous growth of social networks, the scale of social networks also show explosive growth trend, and the community is faced with large amount of data and high complexity. As the traditional community detection algorithm can only be applied to some small-scale network or the specific network. When the number of users in the network is large, the traditional community detection algorithm is limited by the hardware and algorithm complexity.In this paper, as the traditional method is difficult to adapt to the large-scale social network scene. We will use the Spark data processing framework in the traditional community detection algorithm to deal with large-scale user data. The aim is to short the running time of the algorithm. We select the classical label propagation algorithm as the basic algorithm and achieve a parallel community detection algorithm based on Spark. At the same time, the algorithm is improved according to the problem in the process of the experiment. Finally, the efficiency of the algorithm is verified by experiments on the Twitter and Facebook social network data sets. The main work and innovation of this paper are as follows.Firstly, we study the traditional community discovery algorithm and analyze the feasibility of the algorithm.The theory and technology of Spark and other big data processing tools are analysed to provide technical support for the parallelization of the follow-up algorithm.Secondly, we choose traditional label propagation algorithm as the basic algorithm by using the combination of Spark technology and its implementation to solve the problem of community detection on large scale network.Thirdly, In order to solve the problem that the label propagation algorithm based on Spark has a strong performance in the experiment. The concept of the community core node influence is introduced, and it is used in the label propagation algorithm. At the same time, PageRank algorithm based on the Spark is implemented to solve the influence of community core node.On the Spark platform, the improved algorithm is realized. The experimental results of the improved algorithm are analyzed in detail, and the related problems are discussed. Finally, the quality of the community is compared with the two algorithms.
Keywords/Search Tags:community detection, large-scale, Spark, label propagation algorithm, parallel
PDF Full Text Request
Related items