| As Euler succeeded in solving the seven bridge problem,people’s research on graph theory is also rising and community discovery is dependent on the graph theory.Because the application of graph model can make the network study more clearly,graph theory is widely applied in social network,biological network,information network and so on.In our daily life,various organizations are similar to communities,such as classes,towns,provinces and cities.Along with the rise of the Internet,there is a virtual form community,such as the Internet community.The mining of relevant information in the community has become a hot research direction in all fields.The community can be divided into overlapping communities and non overlapping communities,depending on the feature of whether there are overlapping nodes.The community detection methods correspond to the non overlapping community detection method and the overlapping community detection method.In our daily life,overlapping communities are the most common and the probability of appearances of non overlapping community is relatively small,so the study of overlapping communities is more extensive.But the study of non overlapping communities provides a theoretical basis for the study of overlapping communities,many overlapping community detection methods are proposed on the basis of non overlapping community detection methods.Palla et al.firstly discovered and proposed the phenomenon of community overlap,and proposed a method for overlapping community detection.For overlapping communities,there are many methods,such as methods based on group infiltration,methods based on point clustering,methods based on edge clustering.LinkSCAN algorithm is a typical type of methods based on edge clustering.Because it can map the origin graph into link-space graph,the structure of the origin graph can be accurately passed to link-space graph,which makes the clustering results more real and can be more clearly to divide the overlapping community.The traditional LinkSCAN algorithm only maps the origin graph once,and carries out the clustering operation on the basis of it.The Multi-LinkSCAN algorithm proposed in this paper is able to map the origin graph several times on the basis of the traditional algorithm through a custom way,and cluster the link-space graph on the final layer,so as to get more accurate division effect of community.But the link-space graph obtained by LinkSCAN algorithm and Multi-LinkSCAN algorithm increases in the number of points and edges,which makes the time computational complexity for the similarity matrix,Euclidean distance and hierarchical clustering much higher than the origin graph,resulting in runtime consumption surge.At this time,we make the distributed improvements of the computations of matrix multiplication,adjacent matrix,similarity matrix,Euclidean distance and hierarchical clustering through a distributed parallel computation framework Spark,in order to improve the computational efficiency.Meanwhile,this article uses E-MapReduce product of Ali cloud to build Spark cluster,which is better than the ordinary physical Spark cluster in terms of ease of use,cost,resource integration and security,and provides a perfect platform foundation for the smooth development of the experiment. |