Font Size: a A A

Subspace clustering based on fuzzy models and mean shifts

Posted on:2008-02-19Degree:Ph.DType:Dissertation
University:York University (Canada)Candidate:Gan, GuojunFull Text:PDF
GTID:1448390005473894Subject:Mathematics
Abstract/Summary:
Cluster analysis is a way to create groups of objects, or clusters, in such a way that objects in one cluster are very similar and objects in different clusters are quite distinct. Cluster analysis has found applications in many areas such as text mining, pattern recognition, gene expressions, customer segmentations, image processing, etc. However, cluster analysis is a very complex task and faces many challenges, such as the curse of dimensionality and the unknown number of clusters.;The performance of the algorithms is demonstrated through extensive experimental evaluations, using a variety of synthetic data sets.;This dissertation introduces a few novel approaches to overcome some limitations of existing clustering algorithms in clustering high dimensional data sets. It makes four specific contributions: (a) The FSC Algorithm. The fuzzy subspace clustering (FSC) algorithm is a novel method to clustering high dimensional data sets. In this algorithm, we fuzzify dimension rather than class membership; (b) Convergence of the FSC Algorithm. The convergence of the FSC algorithm is established via Zangwill's convergence theorem. It is shown that the iteration sequence produced by the FSC algorithm terminates at a point in the solution set S or there is a subsequence converging to a point in S; (c) The MSSC Algorithm. While the FSC algorithm is developed primarily to deal with the curse of dimensionality, the MSSC (Mean Shift for Subspace Clustering) algorithm is developed to address the issue of determining the number of clusters. The MSSC algorithm uses the idea behind the FSC algorithm to recover subspace clusters and, at the same time, try to find the correct number of subspace clusters; (d) Bifurcations of the MSSC Algorithm. The MSSC algorithm involves a parameter beta. At beta → 0 the MSSC algorithm produces a single cluster containing all the data points, while at beta → infinity the MSSC algorithm produces k distinct clusters, where k is the number of initial centers. In other words, the single cluster will split into small clusters at higher beta. The critical value for beta when the first phase transition occurs is approximated.
Keywords/Search Tags:Cluster, MSSC algorithm, FSC algorithm, Beta
Related items