Font Size: a A A

Co-Clustering Of Heterogeneous Information Network Based On Meta Path

Posted on:2019-10-30Degree:MasterType:Thesis
Country:ChinaCandidate:M HanFull Text:PDF
GTID:2428330548459208Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Network data format can effectively express the relationship between different types of nodes and edges,which is very common in our daily life.For example,the Internet consists of web pages that can capture the links between web pages;Social networks such as “Weibo” and “Today's headlines” can capture the social relationships among people.As a result,a very important information carrier--Heterogeneous Information Network(HIN)emerged.HIN is an important expression form of information,compare with homogeneous information network which is extensive researched.Nodes and links in HIN have rich structure and semantic information,therefore increasing scientific research begin to pay much attention to Heterogeneous Information Network.Clustering analysis,which are as an important research means of data mining,are able to find data points' potential organizational structure.Clustering analysis directly on HIN instead of changing into homogeneous information networks has several obvious advantages: Firstly,HIN has extensive information fusion,which can make clustering results of different types of nodes promote each other.Secondly,Clustering analysis directly on HIN can reduce the loss of semantic or structural information and make the clustering result more accurate.However,there are some common problems in the existing HIN clustering algorithms.Firstly,the ability to migrate is weak and cannot be applied to any Heterogeneous Information Network.Secondly,existing clustering algorithms use only edge information or only generated content to cluster.But it is insufficient to just focus on therelationship or only on the generated content,it's difficult to achieve ideal clustering effect.Thirdly,the similarity information of the same kind of nodes is not used,which makes the clustering result not smooth.Aiming at the above-mentioned problems of clustering analysis in HIN,this paper proposes a meta-path-based HIN co-clustering algorithm R-Net NMTF.R-Net NMTF algorithm has the following advantages that compare with existing HIN clustering algorithms: Firstly,the R-NetNMTF algorithm organizes the Heterogeneous Information Network into a star schema of network.The center-type nodes as the axis are connected to the subordinate nodes,make semi-structured HIN into structured expressions.Since any HIN can be organized into a star schema,the R-Net NMTF algorithm in this paper has a strong ability to migrate.Secondly,by integrating the linkages between the different types of nodes and the generated content information,the edge and the content are effectively combined to coordinate the clustering result.The non-negative matrix tri-factorization algorithm is used simultaneously for multiple non-negative data matrices to achieve soft Co-clustering for all types of nodes in HIN.The clustering factor indicating matrix's element value of R-Net NMTF algorithm is the intensity which the node belongs to.Compared with the result of the hard-clustering algorithm,the R-Net NMTF algorithm can not only reflect the clustering result of nodes but also reflect the fuzzy nature of nodes,makes the clustering results that do not have obvious clustering boundaries more interpretable.Finally,the R-NetNMTF algorithm uses the similarity regularization content of the same type of nodes and also takes the central type nodes connected by the subordinate type node as a metric to optimize the geometric information of the data space and make the global clustering result of the HIN smoother.Experimental results on real data setsshow the validity and correctness of the proposed algorithm R-NetNMTF,and the clustering results are superior to the existing similar algorithms.
Keywords/Search Tags:Heterogeneous Information Network, Data mining, Co-Clustering, Nonnegative Matrix Tri-factorization
PDF Full Text Request
Related items