Font Size: a A A

Research And Implementation Of Clustering Algorithm Based On Heterogeneous Information Network

Posted on:2022-10-16Degree:MasterType:Thesis
Country:ChinaCandidate:B H ZhangFull Text:PDF
GTID:2510306605966989Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
We live in an interconnected world where many complex systems from nature and society can be effectively modeled and described through networks.In the network,each node represents an individual,and each link represents the relationship between two individuals.In real life,the network is everywhere,and the kinds of the network are various.Compared with the traditional homogeneous information network,heterogeneous information network is composed of multiple types of nodes and multiple types of edges,which has stronger representation ability.It is of great significance to analyze the structure of heterogeneous networks by using the rich semantic information of nodes and links in heterogeneous networks.Clustering,as the main task of data mining,is to group nodes,so that the connectivity between nodes in the same group is strong,while the connectivity between nodes in different groups is weak.Clustering can be used to shed light on the topological structure of the network.By clustering the network,we can better understand the underlying structure of the network and reveal the structure-function relations of the network.Therefore,it is of great significance to make use of the rich semantic information of nodes and edges in heterogeneous information network to carry out cluster analysis.However,at present,many traditional clustering algorithms are clustering analysis algorithms for homogeneous information networks.As heterogeneous information networks contain more complex information,the heterogeneity of nodes and edges makes it difficult to characterize and measure the similarity of nodes,which makes it difficult for these algorithms to be directly applied to heterogeneous information networks.In addition,there are some problems in the existing heterogeneous information network clustering algorithms.First of all,the choice of meta-path has certain subjectivity.Secondly,feature extraction and clustering are carried out independently,which makes the clustering structure difficult to be fully described.To solve the above problems,we propose a heterogeneous information network clustering algorithm(aka GEjNMF)by joint Graph Embedding and Nonnegative Matrix Factorization,where feature extraction and clustering are simultaneously learned by exploiting the graph embedding and latent structure of networks.The GEjNMF algorithm first transforms the original heterogeneous information network into an embedded representation matrix of multiple center type nodes with respect to attribute type nodes.Thereafter,the GEjNMF algorithm adopts the joint learning method to combine the feature selection and clustering division of nodes,so that the features of the central type nodes obtained are more suitable for obtaining good clustering division.In other words,features are selected under the supervision of clustering,so that the performance of the algorithm can be improved.In addition,we use a regular term to integrate the homogeneous relationship between nodes into the GEjNMF algorithm.We formulate the objective function of GEjNMF and transform the heterogeneous information network clustering problem into a constrained optimization problem,which is effectively solved by l0-norm optimization.The advantage of GEjNMF algorithm is that features are selected under the guidance of clustering,which improves the performance and saves the running time of algorithms at the same time.In the experimental process,we selected seven representative comparison algorithms to conduct experiments on three standard heterogeneous information network datasets.The experimental results demonstrate that GEjNMF achieves the best performance with the least running time compared with the best state-of-the-art methods.Furthermore,the proposed algorithm is robust across heterogeneous information networks from various fields.
Keywords/Search Tags:Heterogeneous Information Network, Non-negative Matrix Factorization, Joint Learning, Clustering Analysis
PDF Full Text Request
Related items