Font Size: a A A

Parallelization Of AP Clustering Community Detection Algorithm Based On Hadoop Platform

Posted on:2018-06-08Degree:MasterType:Thesis
Country:ChinaCandidate:X J DongFull Text:PDF
GTID:2348330533965871Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Complex networks act as highly abstract models of complex systems in power systems,communication networks and the World Wide Web. Community structure in complex networks represents the trend of node aggregation in networks, which is an important topology attribute of complex networks. Community detection can do great help for effectively understanding the structure and function of complex networks, predicting and analyzing the behavior pattern of networks, which is of great significance for the research and application of systems. Most of the existed community detection algorithms were in the pursuit of accuracy, while ignoring the problem of high time complexity. A parallelization method for community detection based on Hadoop platform was proposed in this paper force on the problem that most community detection algorithms were not enough in dealing with large-scale data. The main works of this paper are as follows:1. Basic knowledge of complex networks and the existed community detection algorithms were studied. Basic theory of complex network and the research status of community detection algorithms were introduced. Ideological fundamentals and process of several popular community detection algorithms were especially analyzed.2. The problem of community detection in complex networks can be transformed into the node clustering problem according to the similarity among nodes. In this paper, the AP clustering was studied hard. Aiming at the issue that traditional AP clustering it used to calculate the similarity of E-distance without taking clustering characteristics of complex networks into account. In this paper,an improved Jaccard coefficient was used to calculate the similarity among nodes in complex networks.3. In order to solve the problem that AP clustering is slow to deal with large-scale data,combined with big data analysis tool Hadoop, a step-by-step parallel clustering AP clustering method was designed to parallelize the whole process of AP algorithm. To achieve community detection of AP clustering algorithm parallelization. Hadoop clusters were built in the PC, and the calculation methods of this paper were tested on different data sets.Experimental results indicated that not only accuracy of community detection, but computational performance of the algorithm can be improved by the proposed method, better speedup can be achieved especially in the large-scale data sets.
Keywords/Search Tags:complex network, community detection, AP clustering, parallelization
PDF Full Text Request
Related items