Font Size: a A A

Community Detection Based On Mapreduce

Posted on:2014-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y X CongFull Text:PDF
GTID:2248330392460978Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, more and more people joinin the Internet. As the most popular Internet application, social networkwebsites capture the majority of users. For example, the number of usersin Sina Weibo has reached300million; And the famous foreign socialnetworking site twitter has reached500million users.As the socialnetwork has attracted such a large number of users, both managers andusers are faced with a problem: how to find people that related to self forinteraction, also is the traditional sense of community. The research of thecommunity structure plays a very important role in understanding andanalyzing the network structure and function and has been widely appliedto various fields such as biology and sociology. Community detectioncomes out for this purpose, and many researchers put forward variouscommunity mining algorithm from different angle, however the currentcommunity detection algorithm is a kind of iterative algorithm and itcan’t divide the net users into independent parts, so can’t adopt parallelcomputing programming model. These algorithms are often used forsmall scale users. When the number of users increases, the time of the calculation will increases greatly because of the big amount of thecalculation.Mapreduce, as a parallel programming model, is good at dealingwith large data and large calculation. If make traditional communitydetection algorithm parallel with Mapreduce programming model andmake a good use of cluster computing advantage to handle big users data,the execution time of the algorithm will be shorten.This paper proposes a community detection algorithm based onMapReduce, it improves the traditional community detection algorithmbased on label propagation and successfully applied to MapReduceprogramming model. This algorithm is not only keeping the original timecomplexity of the algorithm, but also suitable for parallel computing andable to mine out high quality community from large social networkingsites efficiently, improving the computational efficiency. In this paper themain work and innovation points have the following several aspects:First, get experimental data set through the sina weibo API and aftertreatment we get more than270users relationship data; Second, combinewith label propagation algorithm of synchronous update process andasynchronous update process, improve the community detectionalgorithm based on label propagation and design the data structure, sothat we propose a community detection algorithm based on MapReduce;Third, analyze the result through calculating the net clustering coefficient and community clustering coefficient and it proves that the algorithm hasfeasibility and validity. Fourth, calculate the clustering coefficient usingthe MapReduce programming model calculation gathered, shorting thetime of results analysis and improving the efficiency.
Keywords/Search Tags:MapReduce, Community Detection, Parallel Computing, Hadoop, Clustering Coefficient
PDF Full Text Request
Related items