Font Size: a A A

Research On Community Detection And Topic Modeling Technologies In Social Networks

Posted on:2014-02-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:D S DuanFull Text:PDF
GTID:1228330398987658Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the emergence of social networking platforms, users not only can establish links between each other, but also can generate enriched texts. Community detection is one of the important link mining technologies, while topic modeling is one of the main tools for text mining in social networks. To analyze both links and texts in social networks in a unified way, it is very important to study community detection and topic modeling more deeply and to combine them together naturally. However, real social networks are often large scale, dynamic, fast updating and have both links and texts, which raise new challenges to community detection and topic modeling.For community detection, to address the inefficiency of an agglomerative and modularity optimization based Newman algorithm, a heuristic algorithm called OBO-Group (short for One by One Group) is proposed to quickly construct compact communities by only one scan. OBO-Group avoids large volume of computation in the initial merging steps of Newman algorithm such that it significantly improves the algorithm’s efficiency. Based on OBO-Group, a static community detection algorithm S-Group (Static Group) is proposed for weighted directed networks. Experiments on real and synthetic networks show that S-Group is much more efficient than Newman algorithm and its effectiveness is close to Newman algorithm. Community change point detection algorithm Stream-Group is proposed to take the dynamics of social network into account. Experiments on synthetic dynamic networks and Enron email network show that Stream-Group can discover community change points in social networks effectively.Incremental K-Clique clustering algorithm is proposed to consider the fast updating characteristic of social networks. As the special case of incremental K-Clique clustering, incremental2-Clique clustering is converted to the local depth first search forest updating problem, and a series of locally updating strategies are designed to reduce the updating range as much as possible for highly efficient incremental computing without loss of the accuracy of clustering results. Incremental2-Clique clustering is then generalized to the cases of K≥3. Experimental results on co-author network and Enron email network show that incremental K-Clique clustering algorithms are much more efficient than corresponding static algorithms. Compared with incremental spectral clustering, incremental K-Clique clustering is faster and has no accumulating error. In contrast with snapshot network based clustering, incremental K-Clique clustering can uncover many evolution details of clusters in social networks.In the aspect of topic modeling, a ranking based topic model RankTopic is studied to incorporate the link importance of texts to improve the performance of topic modeling. Traditional topic modeling regards the texts as equally important, while in real social networks, texts may have various importance, so treating them as equally important may inherently hurt the performance of topic modeling. Extensive experiments on paper citation network and Twitter data show that RankTopic outperforms baseline topic models in terms of generalization performance, document clustering and classification performance. The topics detected by RankTopic are more interpretable than those detected by baseline models.Most existing models do not consider the coordination effect between community and topic. Some models consider either community or topic such that they cannot detect them simultaneously. Some others model community and topic by the same variable such that it is not so flexible to model real social networks. To address the above issues, a mutual enhanced infinite (MEI) community-topic model is proposed, which incorporates community and topic into a unified probabilistic generative model to detect them simultaneously. MEI explicitly models community and topic by using different variables, and also correlates them together by community-topic distribution. MEI makes the processes of community and topic detection flexible and also makes them mutually enhanced by each other. Dirichlet Process Mixture (DPM) and Hierachical Dirichlet Process (HDP) are used in MEI to automatically detect the numbers of communities and topics. Experimental results show MEI has better generalization performance than existing models and can automatically detect the numbers of communities and topics.
Keywords/Search Tags:community detection, topic modeling, social network, incrementalclustering, ranking, Dirichlet process, modularity, generalization performance
PDF Full Text Request
Related items