Font Size: a A A

Research On Evolutionary Collaborative Clustering Algorithm Of Heterogeneous Information Network

Posted on:2015-11-24Degree:MasterType:Thesis
Country:ChinaCandidate:H TongFull Text:PDF
GTID:2308330461474657Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Information networks are pervasive, and has become an important part of modern information infrastructure. Information networks in various real life applications are often complex, a single-type relationship can be neither a good characterization of the structure of the network, nor enough to provide much basis for knowledge discovery. Departing from many existing network models that view data as homogeneous graphs or networks, the semi-structured heterogeneous information network model leverages the rich semantics of typed nodes and links in a network and can un-cover surprisingly rich knowledge from interconnected data. Clustering analysis of heterogeneous information networks can reveal potential associations between objects from different research areas, so that people may have a better understanding of the hiding global structure of the entire network, and by refining cluster in some area, it will be helpful for improving the clustering accuracy of the field.In this paper, to mine the heterogeneous information networks we did the exploratory research from the following three aspects:First, for bipartite heterogeneous information networks, we proposed a ranking-collaborative clustering algorithm. RankClus combined ranking with the clustering method, by iterative enhancing, had achieved good results. However RankClus gives the clustering results of specified target type only, which cannot present the complete structure of the global communities. In order to get the clustering results of different types simultaneously, we propose a novel iterative clustering algorithm RankCoClus, incorporating ranking with collaboration. At the beginning, the ranking distribution matrix is generated by the ranking distribution generation model based on the posterior probability, and then the co-clustering methods are used for clustering of objects of different types synchronously. Experimental results employing both the real data set and synthetic data sets illustrate that the proposed algorithm can achieve better performance compared to RankClus and the classic Co-clustering algorithm.Second, for the heterogeneous information networks with star network schema, we proposed a SNetCoClus algorithms. To extend the bipartite heterogeneous network schema, paper first defines the star network schema with self-relationship, and introduces the generation method of ranking distribution matrix under this mode. Then based on three non-negative matrix factorization technique, we get the multi-collaborative clustering method suitable to star network schema, by some reforming. The experimental results under the bipartite heterogeneous network mode and normal mode of star network confirm the validation of SNetCoClus algorithm.Finally, to futher, ENetCoClus algorithm is proposed for dynamic heterogeneous information networks. First by analyzing the evolutionary framework introduced by Chakrabati et.al, we import the concept of temporal smoothing. But different from the traditional evolutionary clustering, ENetCoClus consider the temporal smoothing directly on the relationship matrix set, which avoid the comparison between clustering results and the adjustment arising thereform. By incoporaing the collaborative thinking, ENetCoClus not only fully exploit the rich information embedded in the nodes and links of heterogeneous information network, but also solve the similarity measurement problem under the network graph mode.
Keywords/Search Tags:Co-clustering, Evolutionary clustering, Ranking, NMF, Heterogeneous information networks
PDF Full Text Request
Related items