Font Size: a A A

Detecting And Analyzing Community Structure In Large-scale Dynamic Social Network

Posted on:2010-11-18Degree:MasterType:Thesis
Country:ChinaCandidate:H B MaFull Text:PDF
GTID:2120360272997186Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Many systems in real world exist in the form of networks, or can be modeled as networks, as these networks are highly complex, so-called"complex networks". Complex networks research has become one of the most important interdisciplinary research area. As the basic statistical properties such as the small-world and the scale-free nature, network community structure is one of the most fundamental and important topological properties of complex networks, within which the links between nodes are very dense, but between which they are quite sparse. The research of community detecting in complex networks (or complexnetwork clustering) is not only of great theoretical significance for the analysis topology, to understand the function of complex networks, but also has broad application prospects.This paper studies the community detecting and analyzing methods in large-scale dynamic social networks, the main research contents as following.Using Enron E-mail data set as an example, We use time-related analysis methods, build networks from dynamic data set. The data at different times are built into multiple networks.In this paper, we analyzed nine representative kinds of complex network clustering algorithms. Five kinds of algorithms based on optimization methods: Spectral method, KL algorithm, FN algorithm, CNM algorithm, and GA algorithm; four kinds of heuristic methods: GN algorithm, improved GN algorithm, CPM algorithm, and FEC algorithm. We described the principles used in these algorithms and the main algorithm flow steps in detail, and tested the clustering accuracy and running time through experiment. The conclusion as following: 1) For the random network RN (C, s, d, pin), the clustering accuracy of the algorithms increases with the increasing value of pin. 2) In the case that the community structure is very clear (pin >0.7), all algorithms have a very good clustering accuracy (pin >97%), but in the other case that the community structure is not obvious (pin <0.5), the majority algorithms have low clustering accuracy. 3) For random network that those structure we have known, algorithms based on optimization method (FN algorithm excepted) show better clustering accuracy than algorithms based on heuristic method (FEC, CPM, GN). 4) Compared to the classic space clustering algorithms (k-means, c-means), the clustering accuracy of complex network clustering algorithms is not bad, but some algorithms (CPM, FEC, spectral methods) have a clear asset. That shows complex network clustering algorithms is not only suitable for the treatment of clustering in relational data, such as social networks, biological networks, but also can effectively solve the traditional problem of spatial clustering. 5) The clustering accuracy of the algorithms is closely related to the data sets. Algorithms have generally higher clustering accuracy for the data sets with better community structure (Irirs, Wine), and generally low accuracy for the data sets with not very good community structure (Image). But for the date set with better community structure (Irirs), GA algorithm have the lowest clustering accuracy (less than 60%), this indicates that the Q function has bias, result in these algorithms based on optimizing Q function have low clustering accuracy. 6) GA algorithm has the best clustering accuracy to the isomorphism networks (the degree of vertex and the size of community are of approximation), but its calculation speed is very slow, and only suitable for treatment to small-scale complex networks; two spectral methods (A-cut, N-cut) have higher clustering accuracy, but they are matrix computing-based, not suitable for handling large-scale network, only for small and medium-scale networks; FEC, CPM have higher clustering accuracy (both for random networks and real data sets) and very good performance in operating efficiency, suitable for treatment to large-scale complex networks.Most of these representative algorithms need to know the overall structure of the network. But for many large scale and dynamic networks, such as the World Wide Web, it is impossible to know the overall structure of the networks. More importantly, in many practical applications, there is unnecessary to find all the community in the networks, we want only to know the structure of a local community. In these circumstances, we need not to look for all the communities costing too much. Therefore study for the local community structure when we not fully understand the overall structure of the networks is very necessary.The method we designed for detecting a local community in complex network simulates the growth process of a community. The steps of our method are as following: beginning from a node v0 as an initial cluster, considering the out-cluster-degree, in-cluster-degree, and status of its neighbor nodes, in accordance with a kind of selection strategy, we add a new node of these neighbors to the cluster, then the neighbors of the new node we add would be considered too in the next step. In this way, we can add a new node to the cluster in each step, and find the next growth direction. Repeating the process can enable the cluster grow different.directions. When growing to its age, cluster can add new nodes in accordance with another kind of selection strategy. Because we control the growth status of the nodes, we can know when and where the cluster reaches its edge nodes.While add a new node to the cluster, we only consider the status of its neighbor nodes, the time complexity of selecting a new node to the cluster is O(k), where k is the average node degree of the network, so the complexity of finding a local community is O(kd), where d is the size of the community. The algorithm testing experiments show that: for the random networks RN (C, s, d, pin), the algorithm we designed has equivalent clustering accuracy with FN and CPM algorithms; for the real data sets such as image, irirs and wine, it shows good performance, for large complex networks it also can mine a cluster fastly. The major statistical parameters in complex networks are: Betweenness Centrality, Closeness Centrality, Eigenvector Centrality, Degree Centrality, Strong Components. Those reflect in some extent that paths by the node information can flow from any one node to any other node in this community, steps for information to get from any node to any other node in the community, how cohesive the community is, connections of a node to other nodes, the connectivity of a complex network. For nodes in complex social networks, those have high value of Betweenness Centrality, Closeness Centrality, Eigenvector Centrality, Degree Centrality are more likely to be the key players of the network. Do the statistics analysis to these parameters, and using the visual network tool, we can mine the central roles and the special cluster structures in the networks against the special background.With the expansion of application fields and the diversity of network clustering problems, the existing network clustering methods have been not fit for those problems, so there is need to design new network clustering methods for special type of complex networks. "Flow"-type complex network clustering methods, high-dimensional complex network clustering methods, and the distributed complex network clustering methods need for further research.
Keywords/Search Tags:complex social network, network community structure, complex network clustering method, complex network analysis, dynamic network, enron dataset
PDF Full Text Request
Related items