Font Size: a A A

On the clustering of heterogeneous data: A graph theoretic approach

Posted on:2008-05-11Degree:Ph.DType:Dissertation
University:Wayne State UniversityCandidate:Rege, ManjeetFull Text:PDF
GTID:1448390005457584Subject:Computer Science
Abstract/Summary:
Data clustering is the classification of data objects into different groups (clusters) such that objects in one group are similar together and dissimilar from another group. Homogeneous clustering consists of objects of a single type that need to be clustered. Heterogeneous clustering, on the other hand, refers to the problem of clustering objects belonging to more than one data type. The contribution of this dissertation is three-fold: homogeneous clustering of images, pairwise heterogeneous data co-clustering, and high-order star-structured heterogeneous data co-clustering.;First, image clustering achieved by applying traditional clustering algorithms on visual features suffers from the problem of semantic gap. We propose a semantic-based hierarchical image clustering framework based on multi-user feedback. By treating each user as an independent weak classifier, we show that combining multi-user feedback is equivalent to the combinations of weak independent classifiers. We have achieved superior results compared to other typical methods to organize an image database.;Second, we present a novel graph theoretic approach to perform pairwise heterogeneous data co-clustering. The two data types are modeled as the two sets of vertices of a weighted bipartite graph. We then propose Isoperimetric Co-clustering Algorithm (ICA), a new method for partitioning the bipartite graph. ICA requires a simple solution to a sparse system of linear equations instead of the eigenvalue or SVD problem in the popular spectral co-clustering approach. Our theoretical analysis and extensive experiments performed on publicly available datasets demonstrate the advantages of ICA over spectral approach in terms of quality, efficiency, and stability in partitioning the bipartite graph.;Lastly, for high-order heterogeneous co-clustering, we propose the Consistent Isoperimetric High-Order Co-clustering (CIHC) framework to address star-structured co-clustering problems in which a central data type is connected to all the other data types. We model this kind of data using a k-partite graph and partition it by considering it as a fusion of multiple bipartite graphs. Experiments on text corpora and real Web images show that CIHC outperforms existing algorithms for partitioning the star-structured graph.
Keywords/Search Tags:Data, Clustering, Graph, ICA, Approach, Objects
Related items