Multiple alternative clusterings and dimensionality reduction

Posted on:2013-07-15

Degree:Ph.D

Type:Dissertation

University:Northeastern University

Candidate:Niu, Donglin

Full Text:PDF

GTID:1458390008970384

Subject:Artificial Intelligence

Abstract/Summary:

Clustering is the process of grouping objects based on some notion of similarity. It is commonly applied for exploratory data analysis, segmentation, preprocessing and data summarization. Spectral clustering is a flexible clustering methodology that is applicable to a variety of data types and has the particular virtue that it makes few assumptions on cluster shapes. It has become popular in a variety of application areas, particularly in computational vision and bioinformatics. The approach appears, however, to be particularly sensitive to irrelevant and noisy dimensions in the data. We thus introduce an approach that automatically learns the relevant dimensions and spectral clustering simultaneously. We pursue an augmented form of spectral clustering in which an explicit projection operator is incorporated in the relaxed optimization functional.;Traditional clustering algorithms only find one clustering solution. However, data can be grouped and interpreted in many different ways. Moreover, different clustering solutions are interesting for different purposes. Instead of committing to one clustering solution, here we introduce four methods that can provide several possible alternative clustering solutions (we call views) to the user for exploratory data analysis: (1) The first method we developed discovers multiple feature subspace views by clustering the features through a feature similarity graph. Features relevant to one clustering interpretation may be different from the ones relevant for an alternative interpretation. This approach can automatically discover the relevant features in each clustering interpretation. (2) The second method incorporates dimensionality reduction in spectral clustering for each view and penalizes for dependence among different subspace views to ensure non-redundancy. This approach simultaneously learns non-redundant subspaces that provide multiple views and finds a clustering solution in each view. (3) There are two modes of discovering alternative views - simultaneous or iterative. The first two approaches are simultaneous methods. The third method we developed learns alternative clusterings iteratively. This approach learns alternative spectral clustering solutions in a lower dimensional subspace given one or several previously found or provided solutions. (4) We propose another method, which is a probabilistic nonparametric Bayesian model that can discover multiple clustering solutions from data and the feature subsets that are relevant for the clusters in each view. In our model, the features in different views may be shared and therefore the sets of relevant features are allowed to overlap. We model feature relevance to each view using an Indian Buffet Process and the cluster membership in each view using a Chinese Restaurant Process. We provide a Markov chain Monte Carlo inference approach to learn the latent parameters corresponding to this multiple partitioning problem. Our model not only learns the features and clusters in each view but also automatically learns the number of clusters, number of views and number of features in each view.

Keywords/Search Tags:

Clustering, Each view, Alternative, Multiple, Features, Data, Learns

Related items

1	Researches On Clustering With Multi-view And Incomplete Features
2	Research On Incomplete Multi-view Clustering
3	The Research And Application Of Multiple-exemplar Clustering
4	Research On Multi-view Clustering Algorithms
5	Exploiting Multiple Semantic Features For Comment Text Clustering
6	Research On Clustering With Multi-view Data
7	Research On Multi-View Incremental Clustering Based On Multiple Kernel Learning
8	Alternative Clustering On Stream Data
9	Clustering of mRNA-Seq Data for Detection of Alternative Splicing Pattern
10	Research On Hierarchical Robust Multi-view Learning