Font Size: a A A

Design And Implementation Of Community Detection Algorithms Baesd On Mapreduce

Posted on:2016-12-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2308330461982902Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Since human society entered the Internet era, different kinds of social network service platform provide people with a low-cost way to communicate. However, due to the concealment of the information spread on the Internet, the cyber espionage,’water army’ and other hostile forces may exploit the weakness to obtain or propagate information). The network spy, water army and other hostile forces can use it to get or spread information. It’s a so urgent need to design a community detection algorithm to deal with the large amounts of data in social network that we can monitor hostile force communities.Community detection algorithms are generally divided into two categories. One is according to the clustering algorithms in data mining field; the other is according to the related knowledge of graph theory, users on the Internet can be directly abstracted to the nodes in the graph and the relationship between them can be abstracted to the edges in the graph. Thus, we detect communities in network topology.Because the traditional community detection algorithms are not able to meet the requirement of computational complexity, storage capacity and others, it’s not suitable for dealing with large amounts of data in social network. In order to improve the computational efficiency of the algorithms, making them applied to large amounts of data, the distributed platforms are essential to the algorithms. MapReduce is a model of parallel computing, it can be used for big data processing.The key points of this paper are summarized as follows:1) The paper focuses on the research of Map-Reduce programming model and the analysis of some traditional clustering algorithms. Appropriate improvements are made to KMeans clustering algorithm with the randomicity of choosing initial clustering centers and the local optimum of clustering results. The paper presents an algorithm named MRKC (KMeans with Canopy algorithm based on MapReduce) and implements it on MapReduce. On this basis, this paper also analyzes some related clustering algorithms and the overall process of community detection which clustering is applied to, including crawling user data from the social network messages.2) The information crawled from social network may be incomplete because of the permission and other issues. So this paper implements two community detection algorithms based on friends’relationship. According to information of the users’ relationships, the algorithm calculates the similarity between friends to deal with the graph partitioning. Finally by calculating the connected sub-graphs, the algorithm can find the communities.3) The paper also focuses on the research of visualization algorithms of topology and implements the algorithms to display the result of the community detection by the force-direct layout algorithm.4) Design and implement the experimental system of community detection algorithms based on MapReduce and verify the validity of the algorithms.Through the comparison for algorithms of this paper and classical community detection algorithms with the open data, the result shows that algorithms of this paper have better accuracy and scalability. After running the algorithm with large amounts of data of social network on Hadoop cluster, the results of the experiment verify the speedup of above algorithms. It can be applied to the community detection of social network with large amount of data.
Keywords/Search Tags:community detection, MapReduce, Hadoop, graph theory, clustering algorith
PDF Full Text Request
Related items