Font Size: a A A

Multiple alternative clusterings and dimensionality reduction

Posted on:2013-07-15Degree:Ph.DType:Dissertation
University:Northeastern UniversityCandidate:Niu, DonglinFull Text:PDF
GTID:1458390008970384Subject:Artificial Intelligence
Abstract/Summary:
Clustering is the process of grouping objects based on some notion of similarity. It is commonly applied for exploratory data analysis, segmentation, preprocessing and data summarization. Spectral clustering is a flexible clustering methodology that is applicable to a variety of data types and has the particular virtue that it makes few assumptions on cluster shapes. It has become popular in a variety of application areas, particularly in computational vision and bioinformatics. The approach appears, however, to be particularly sensitive to irrelevant and noisy dimensions in the data. We thus introduce an approach that automatically learns the relevant dimensions and spectral clustering simultaneously. We pursue an augmented form of spectral clustering in which an explicit projection operator is incorporated in the relaxed optimization functional.;Traditional clustering algorithms only find one clustering solution. However, data can be grouped and interpreted in many different ways. Moreover, different clustering solutions are interesting for different purposes. Instead of committing to one clustering solution, here we introduce four methods that can provide several possible alternative clustering solutions (we call views) to the user for exploratory data analysis: (1) The first method we developed discovers multiple feature subspace views by clustering the features through a feature similarity graph. Features relevant to one clustering interpretation may be different from the ones relevant for an alternative interpretation. This approach can automatically discover the relevant features in each clustering interpretation. (2) The second method incorporates dimensionality reduction in spectral clustering for each view and penalizes for dependence among different subspace views to ensure non-redundancy. This approach simultaneously learns non-redundant subspaces that provide multiple views and finds a clustering solution in each view. (3) There are two modes of discovering alternative views - simultaneous or iterative. The first two approaches are simultaneous methods. The third method we developed learns alternative clusterings iteratively. This approach learns alternative spectral clustering solutions in a lower dimensional subspace given one or several previously found or provided solutions. (4) We propose another method, which is a probabilistic nonparametric Bayesian model that can discover multiple clustering solutions from data and the feature subsets that are relevant for the clusters in each view. In our model, the features in different views may be shared and therefore the sets of relevant features are allowed to overlap. We model feature relevance to each view using an Indian Buffet Process and the cluster membership in each view using a Chinese Restaurant Process. We provide a Markov chain Monte Carlo inference approach to learn the latent parameters corresponding to this multiple partitioning problem. Our model not only learns the features and clusters in each view but also automatically learns the number of clusters, number of views and number of features in each view.
Keywords/Search Tags:Clustering, Each view, Alternative, Multiple, Features, Data, Learns
Related items