Font Size: a A A

Research On Community Discovery Methods In Heterogeneous Information Networks

Posted on:2021-03-05Degree:MasterType:Thesis
Country:ChinaCandidate:X N ZhangFull Text:PDF
GTID:2370330629482572Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As an important research method in the field of data mining,community discovery can mine the hidden information in the network,which has important research value in product recommendation,advertising and public opinion monitoring.But at present,most of the research on community discovery methods is carried out in the homogeneous network,that is,all nodes in the network are defined as the same type,and good research results have been achieved.However,in real life,most of the networks are heterogeneous networks,that is,the nodes and edges in the network are of multiple types.In recent years,heterogeneous networks have attracted more and more attention.The reason is that the heterogeneous network is consistent with the actual network,but the multi types of nodes and links in the heterogeneous network make the network extremely complex,which also brings great challenges to the research of community discovery methods.Therefore,this paper makes an in-depth study on the heterogeneous network community discovery method.The main research contents are as follows: 1.Build a model that can represent heterogeneous network and deal with the nodes in the model.2.An improved k-means algorithm for community discovery is proposed.1.Construction of heterogeneous network model and processing of nodes in the model:Use hypergraph modeling.Hypergraph model can express different types of nodes and different semantic edges in a network,so as to represent multiple types of nodes and complex relationships in heterogeneous networks.In addition,in hypergraph model,deepwalk network is used to represent learning algorithm,which maps heterogeneous network nodes represented by hypergraph to low dimensional and dense vector space.2.Improve the community discovery algorithm of K-means: as a classical community discovery algorithm,K-means algorithm is widely used,and its thought is simple and easy to understand,easy to be mastered and learned by people.At the same time,the algorithm programming is relatively simple,so it is favored by many researchers.But at the same time,K-means algorithm is very sensitive to the initial clustering center and has a large dependence.In traditional K-means,the selection of cluster centers is random,which is easy to cause local optimization of community partition and inaccurate partition results.Therefore,this paper proposes a method of cluster center selection based on density Gini coefficient,which selects the initial cluster centers by calculating the local density of nodes.In this paper,we use NMI and Precision as evaluation indexes,and compare the improved kmeans algorithm with other community partition methods based on heterogeneous network.The results show that the improved k-means algorithm improves the evaluation indexes NMI and Precision.Simulation experiments show that the improved k-means algorithm can handle the improved k-means algorithm Heterogeneous information network.At the same time,compared with the original k-means algorithm.the clustering center can be obtained without iteration,which reduces the complexity of the algorithm.Finally,through the experimental verification,the improved k-means algorithm is feasible and effective.
Keywords/Search Tags:Heterogeneous network, K-means, Hypergraph, DeepWalk
PDF Full Text Request
Related items