Font Size: a A A

Clustering Heterogeneous Information Networks Based On Tensor Decomposition

Posted on:2018-10-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:J B WuFull Text:PDF
GTID:1360330623450360Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
The information networks are ubiquitous in real-world applications,such as social media networks,online e-commerce systems,biological information networks,health care information systems and most database systems.Therefore,effective mining and analysis of information networks poses an interesting but critical challenge.In recent years,information network mining has become a new and hot research focus in fields of data mining and information retrieval.Unfortunately,most existing approaches,such as spectral clustering,are designed to analyze homogeneous information networks,which are composed of only one type of objects and links.However,in real-world scenarios,information networks are usually heterogeneous,which contain multiple types of objects and multiple types of links between objects.Clustering analysis is a kind of unsupervised learning method in data mining,also is a practical and indispensable way to analyze data in machine learning and artificial intelligence.Clustering analysis aims to classify the data without labels automatically,which can find the interpretable hidden patterns and structures in large-scale datasets.However,most existing clustering algorithms,such as spectral clustering,are designed to analyze discrete point sets or homogeneous information networks.The traditional methods for analyse the multiple types of objects and semantic relationships in heterogeneous information networks intend to convert such networks into homogeneous information networks by embeding the objects into an Euclidean space.Such transformations ignore the explicit dependence across the different object and link types,which inevitably cause the loss of semantic information and damage of structures.Duo to the limitations of traditional clustering methods,some recent studies focus on heterogeneous information networks and yield some research fruits,such as RankClus and NetClus.Though these new methods have overcome the limitations of traditional methods,the applications of heterogeneous information network mining methods are also very limited since the strict conditions and strong assumptions.For example,RankClus can only be used to model bi-typed networks,while NetClus was developed for the star network schema.Unfortunately,there are few heterogeneous information networks following the simple and perfectly defined network schema in the real-world situations.In this paper,to overcome the limitations of traditional clustering methods for heterogeneous information networks and the disadvantage of current clustering methods for heterogeneous information networks,we study the clustering heterogeneous information networks with general network schemas based on tensor decomposition.A novel model of heterogeneous information networks based on tensor representation is proposed.Then,a generic,network schema agnostic sparse tensor factorization for single-pass clustering framework,a tensor decomposition based clustering with sparse constraint,and a multityped community discovery model in dynamic heterogeneous information networks are designed.The main research works and innovations in this paper include:1.Based on the related works on mining heterogeneous information networks,wesummarize the challenges in clustering heterogeneous information networks,andpropose a novel model of heterogeneous information networks through tensor rep-resentation without the restriction of network schemas.In the tensor representa-tion,the semantic relationships across different types of objects are modeled viathe high-order property of tensor.By using the sparsity of tensor,we can compressthe storage of heterogeneous information networks.2.We study a generic,network schema agnostic and single-pass clustering frameworkfor heterogeneous information networks,and propose a STFClus(Sparse TensorFactorization based Clustering)algorithm.In this clustering framework,non-distancefunction for similarity measurement between pairs of objects is needed,and multi-types of objects can be clustered simultaneously in a single pass.3.We propose two efficient stochastic gradient descent algorithms,and use the spar-sity of tensor to accelerate computing,which can guarantee the sparsity of factormatrices in clustering results.4.We study the multi-typed community discovery in dynamic heterogeneous infor-mation networks.Based on the properties of the multi-typed communities in het-erogeneous information networks,which contain multiple types of dynamic objectsand links,we model the multi-typed community as a Rank-one tensor and proposea method for determination the number of multi-typed communities automatically.
Keywords/Search Tags:Heterogeneous information network, Clustering, Tensor decomposition, Stochastic tensor gradient descent, Multi-typed community
PDF Full Text Request
Related items