Font Size: a A A

Research Of Incorporating Side Information Into Multivariate IB Method For Multi-view Clustering

Posted on:2015-06-24Degree:MasterType:Thesis
Country:ChinaCandidate:R N LiuFull Text:PDF
GTID:2298330431995526Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The data in most of current real word applications is often complex andhigh-dimensional, which always contains multiple reasonable clusterings. Analyzingthe data from different views can help us understand data more comprehensively.However, traditional clustering algorithms focus on learning a single good clusteringsolution, which is difficult to put an accurate interpretation on the complex data.This issue has recently led to the emerging research area of multi-view clustering.Multi-view clustering tries to discover multiple clustering solutions resided in data.Existing multi-view clustering algorithms have the problems of cannot or can onlyincorporating one known clustering partition, applicable data being limited, needingto specify parameters that are not easy to choose in advance, etc. To solve the aboveproblems, this paper incorporates side information into the multivariate informationbottleneck (IB) method, and proposes a new objective-function-oriented multi-viewclustering algorithm, named SmIB, to iteratively discover multiple non-redundantand high-quality clustering solutions given one or more existing clusteringpartitions.SmIB algorithm takes the known reference clustering partitions as sideinformation and incorporates such information into the multivariate IB method. Onone hand, based on the basic idea of multivariate IB method, it utilizes two BayesianNetworks for specifying the trade-off terms: which variables to compress and whichinformation terms should be maintained, and preserves the relevant featureinformation of data as much as possible during clustering, through what it getshigh-quality clustering solutions. On the other hand, it takes known data partitions asside information and integrates them into the Bayesian Networks to constrainobjective clustering results, so that the objective clustering solutions arenon-redundant from existing clustering partitions. SmIB algorithm adopts mutualinformation and nonparametric MeanNN differential entropy estimator to measurethe preserved relevant information, through what it is not only suitable for analyzing co-occurrence data, but also suitable for analyzing Euclidean space data. Besides,SmIB algorithm has the ability to discover both linear and non-linear clusteringpartitions resided in data. The experimental results on synthetic, co-occurrence,Euclidean space datasets demonstrate that SmIB algorithm can discover multiplereasonable clustering solutions resided in different types of data effectively. Itsperformance is superior to the existing state-of-the-art traditional clusteringalgorithms and three existing multi-view clustering algorithms.
Keywords/Search Tags:multi-view clustering, side information, multivariate IB methodmutual information, MeanNN differential entropy estimator
PDF Full Text Request
Related items