Font Size: a A A

Research On Community Detection Algorithm And Community Feature In Large-scale Heterogeneous Information Network

Posted on:2018-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:Z L ZhangFull Text:PDF
GTID:2348330518996908Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, information has shown an explosion of growth, and there has been a variety of information networks, such as: academic networks, social networks (e.g. Facebook),and so on. In the study of information networks, the community detection has attracted widespread concern of researchers. Some researchers construct homogenous information networks to find community structures. For example,the co-author community is found by constructing co-author networks. But the homogenous information network contains only a single type of entity and relationship, which can not reflact the network topology information and the entities' semantic information, which leads to lower accuracy of communities detected.Heterogeneous information network contains a variety types of entities and entity relations, and the main problem is the complexity challenge of community detection due to large scale and heterogeneity.To address this issue, some researchers construct multiplex to detect the community structure. Although this method solves the complexity challenge of community detection due to large scale and heterogeneity,this method can only find the community of single-type nodes. Some other researchers put forward the overlapping detection methods based on probability model and matrix factorization, which can meet the requirements of heterogeneity, but it can not meet the requirement of large-scale network because of the high complexity of both space and time. On the other hand, the overlapping community is a significant feature of the real network, that is, a node in the network may belong to multiple communities, which requires the community detection algorithm to effectively discover the overlapping community structure in the network. After discovering community structure of a heterogeneous information network, another important problem is that communities need to be described in order to be better presented and analyzed.Therefore, it is necessary to study large-scale heterogeneous information network overlapping community detection algorithm and community feature.In order to accurately and efficiently detect overlapping communities of large-scale heterogeneous information network, in this paper, firstly, an overlapping community detection algorithm based on label propagation with neighbor node influence is proposed. The algorithm has linear time complexity and is suitable for large-scale homogenous information networks. Then, aiming at the large scale and heterogeneityexisting on heterogeneous information networks, this paper introduces the above community detection algorithm on homogenous information networks into heterogeneous information networks, combines the network topology information and semantic information, and proposes an overlapping community detection algorithm for heterogeneous information networks based on multiplex extraction and seed-centric community. This algorithm is applicable to heterogeneous information networks with arbitrary forms, and according to different research needs of users, the algorithm can select different types of central nodes, and different community partition results can be obtained. Finally, aiming at the heterogeneous academic network, this paper introduces paper keywords distribution, paper time distribution,and author fields' distribution of community and citation strength between communities to show and analyze the community features of heterogeneous academic network.In the experiment section, firstly, based on the real-world networks and LFR benchmark networks, this paper uses overlapping modularity,normalized mutual information (NMI) and F-score to evaluate our proposed overlapping community detection algorithm based on label propagation with neighbor node influence. Experimental results show that the proposed algorithm has high accuracy and stability, and it has linear time complexity which can be applied to large-scale homogenous information networks. Then, based on the real heterogeneous academic network, this paper uses paper keywords relation, topic similarity of papers, author relation to evaluate the proposed overlapping community detection algorithm based on multiplex extraction and seed-centric community. The experimental results show that the proposed algorithm based on multiplex extraction and seed-centric community effectively improves the accuracy of communities detected on large-scale heterogeneous information network and also has linear time complexity which means a low time cost. In addition, the introduced community characterization methods are effective in indicating and describing the community features of heterogeneous academic network.
Keywords/Search Tags:heterogeneous information network, large-scale, overlapping community detection, community feature
PDF Full Text Request
Related items